Gapped And Tunable Repeat Units For Use In Genome Editing And Gene Regulation Compositions

ABSTRACT

Provided herein are DNA binding domains comprising a plurality of repeat units, wherein each repeat unit is expanded or contracted in length. Also provided herein are DNA binding domains comprising a plurality of repeat units, wherein each repeat unit is separated from a neighboring repeat unit by a linker. In certain aspects, the linker includes a recognition site. Also disclosed are DNA binding proteins that include a fragment of N-cap sequence of a TALE protein. The TALE protein may be a  Xanthomonas  TALE protein.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority pursuant to 35 U.S.C. § 119(e) to U.S.Provisional Application No. 62/690,890, filed Jun. 27, 2018, U.S.Provisional Application No. 62/716,217, filed Aug. 8, 2018, and U.S.Provisional Application No. 62/852,158, filed May 23, 2019, thedisclosures of which are incorporated herein by reference in theirentirety.

INCORPORATION BY REFERENCE OF SEQUENCE LISTING PROVIDED AS A TEXT FILE

A Sequence Listing is provided herewith as a text file, “ALTI-724 SeqList_ST25.txt,” created on Aug. 9, 2021 and having a size of 486 KB. Thecontents of the text file are incorporated by reference herein in theirentirety.

INTRODUCTION

Genome editing and gene regulation techniques require the development ofnucleic acid binding domains having strong and specific binding totarget genes. Provided herein are DNA binding domains with tunablebinding activity. Additionally, genome editing and gene regulationcompositions having functional linker regions are provided, yieldingcompositions that exhibit dual activities. Also provided herein arecompositions and methods for genome editing and gene regulation, wherethe nucleic acid binding domain is derived from DNA binding proteinsfrom bacteria from the genus Xanthomonas.

SUMMARY

In various aspects, the present disclosure provides a compositioncomprising a modular nucleic acid binding domain comprising a pluralityof repeat units, wherein a repeat unit of the plurality of repeat unitsrecognizes a target nucleic acid base and wherein the plurality ofrepeat units has one or more of the following characteristics: (a) atleast one repeat unit comprising greater than 39 amino acid residues;(b) at least one repeat unit comprising greater than 35 amino acidresidues derived from the genus of Ralstonia; (c) at least one repeatunit comprising less than 32 amino acid residues; and (d) each repeatunit of the plurality of repeat units is separated from a neighboringrepeat unit by a linker comprising a recognition site.

In some aspects, the at least one repeat unit comprises an amino acidselected from glycine, alanine, threonine or histidine at a positionafter an amino acid residue at position 35. In some aspects, the atleast one repeat unit comprises an amino acid selected from glycine,alanine, threonine or histidine at a position after an amino acidresidue at position 39. In some aspects, the recognition site is for asmall molecule, a protease, or a kinase. In some aspects, therecognition site serves as a localization signal.

In further aspects, the composition further comprises a cleavage domainlinked to the modular nucleic acid binding domain to form anon-naturally occurring fusion protein. In some aspects, the modularnucleic acid binding domain comprises a potency for a target sitegreater than 65% and a specificity ratio for the target site of 50:1;and a functional domain; wherein the modular nucleic acid binding domaincomprises a plurality of repeat units, wherein at least one repeat unitof the plurality comprises a binding region configured to bind to atarget nucleic acid base in the target site, wherein the potencycomprises indel percentage at the target site, and wherein thespecificity ratio comprises indel percentage at the target site overindel percentage at a top-ranked off-target site of the non-naturallyoccurring fusion protein.

In some aspects, the repeat unit comprises a sequence ofA₁₋₁₁X₁X₂B₁₄₋₃₅, wherein each amino acid residue of A₁₋₁₁ comprises anyamino acid residue; wherein X₁X₂ comprises the binding region; whereineach amino acid residue of B₁₄₋₃₅ comprises any amino acid; and whereina first repeat unit of the plurality of repeat units comprises at leastone residue in A₁₋₁₁, B₁₄₋₃₅, or a combination thereof that differs froma corresponding residue in a second repeat unit of the plurality ofrepeat units.

In some aspects, the binding region comprises an amino acid residue atposition 13 or an amino acid residue at position 12 and the amino acidresidue at position 13. In further aspects, the amino acid residue atposition 13 binds to the target nucleic acid base. In still furtheraspects, the amino acid residue at position 12 stabilizes theconfiguration of the binding region. In some aspects, the indelpercentage is measured by deep sequencing. In some aspects, the modularnucleic acid binding domain further comprises one or more propertiesselected from the following: (a) binds the target site, wherein thetarget site comprises a 5′ guanine; (b) comprises from 7 repeat units to25 repeat units; and (c) upon binding to the target site, the modularnucleic acid binding domain is separated from a second modular nucleicacid binding domain bound to a second target site by from 2 to 50 basepairs.

In some aspects, the plurality of repeat units comprises a Ralstoniarepeat unit, a Xanthomonas repeat unit, a Legionella repeat unit, or anycombination thereof. In further aspects, the Ralstonia repeat unit is aRalstonia solanacearum repeat unit, the Xanthomonas repeat unit is aXanthomonas spp. repeat unit, and the Legionella repeat unit is aLegionella quateirensis repeat unit. In still further aspects, theB₁₄₋₃₅ of at least one repeat unit of the plurality of repeat units hasat least 92% sequence identity to GGKQALEAVRAQLLDLRAAPYG (SEQ ID NO:280).

In some aspects, the binding region comprises HD binding to cytosine, NGbinding to thymidine, NK binding to guanine, SI binding to adenosine, RSbinding to adenosine, HN binding to guanine, or NT binds to adenosine.In some aspects, the at least one repeat unit comprises any one of SEQID NO: 267-SEQ ID NO: 279. In some aspects, the at least one repeat unitcomprises at least 80%, at least 85%, at least 90%, at least 95%, atleast 96%, at least 97%, at least 98%, at least 99%, or a 100% sequenceidentity with any one of SEQ ID NO: 168-SEQ ID NO: 263. In some aspects,the at least one repeat unit comprises at least 80%, at least 85%, atleast 90%, at least 95%, at least 96%, at least 97%, at least 98%, atleast 99%, or a 100% sequence identity with SEQ ID NO: 209, SEQ ID NO:197, SEQ ID NO: 233, SEQ ID NO: 253, SEQ ID NO: 203, or SEQ ID NO: 218.In some aspects, the at least one repeat unit comprises any one of SEQID NO: 168-SEQ ID NO: 263. In some aspects, the at least one repeat unitcomprises SEQ ID NO: 209, SEQ ID NO: 197, SEQ ID NO: 233, SEQ ID NO:253, SEQ ID NO: 203, or SEQ ID NO: 218.

In some aspects, the target nucleic acid base is cytosine, guanine,thymidine, adenosine, uracil, or a combination thereof. In some aspects,the modular nucleic acid binding domain comprises an N-terminus aminoacid sequence, a C-terminus amino acid sequence, or a combinationthereof. In further aspects, the N-terminus amino acid sequence is fromXanthomonas spp., Legionella quateirensis, or Ralstonia solanacearum. Instill further aspects, the N-terminus amino acid sequence comprises atleast 80%, at least 85%, at least 90%, at least 95%, at least 96%, atleast 97%, at least 98%, at least 99%, or a 100% sequence identity toSEQ ID NO: 264, SEQ ID NO: 300, SEQ ID NO: 335, SEQ ID NO: 303, SEQ IDNO: 301, SEQ ID NO: 304, or SEQ ID NO: 320, SEQ ID NO: 321, or SEQ IDNO: 322. In still further aspects, the N-terminus amino acid sequencecomprises SEQ ID NO: 264, SEQ ID NO: 300, SEQ ID NO: 335, SEQ ID NO:303, SEQ ID NO: 301, SEQ ID NO: 304, or SEQ ID NO: 320, SEQ ID NO: 321,or SEQ ID NO: 322.

In some aspects, the C-terminus amino acid sequence is from Xanthomonasspp., Legionella quateirensis, or Ralstonia solanacearum. In furtheraspects, the C-terminus amino acid sequence comprises at least 80%, atleast 85%, at least 90%, at least 95%, at least 96%, at least 97%, atleast 98%, at least 99%, or a 100% sequence identity to SEQ ID NO: 266,SEQ ID NO: 298, or SEQ ID NO: 306. In still further aspects, theC-terminus amino acid sequence comprises SEQ ID NO: 266, SEQ ID NO: 298,or SEQ ID NO: 306. In some aspects, the C-terminus amino acid sequenceserves as a linker between the modular nucleic acid binding domain and afunctional domain.

In some aspects, the modular nucleic acid binding domain comprises ahalf repeat. In some aspects, the half repeat comprises at least 80%, atleast 85%, at least 90%, at least 95%, at least 96%, at least 97%, atleast 98%, at least 99%, or a 100% sequence identity to SEQ ID NO: 265,SEQ ID NO: 327-SEQ ID NO: 334, or SEQ ID NO: 290. In some aspects, thefunctional domain is a cleavage domain or a repression domain. In someaspects, the cleavage domain comprises at least 33.3% divergence fromSEQ ID NO: 163 and is immunologically orthogonal to SEQ ID NO: 163. Insome aspects, the composition comprises one or more of the followingcharacteristics: (a) induces greater than 1% indels at the target site;(b) the cleavage domain comprises a molecular weight of less than 23kDa; (c) the cleavage domain comprises less than 196 amino acids; and(d) capable of cleaving across a spacer region greater than 24 basepairs.

In some aspects, the composition induces greater than 5%, greater than10%, greater than 20%, greater than 30%, greater than 40%, greater than50%, greater than 60%, greater than 70%, greater than 80%, or greaterthan 90% indels at the target site. In some aspects, the cleavage domaincomprises at least 35%, at least 40%, at least 45%, at least 50%, atleast 55%, at least 60%, at least 65%, at least 70%, or at least 75%divergence from SEQ ID NO: 163. In some aspects, the cleavage domaincomprises a sequence selected from SEQ ID NO: 316-SEQ ID NO: 319.

In some aspects, the cleavage domain comprises a nucleic acid sequenceencoding for a sequence having at least 80% sequence identity with SEQID NO: 1-SEQ ID NO: 81. In some aspects, the cleavage domain comprises anucleic acid sequence encoding for a sequence selected from SEQ ID NO:1-SEQ ID NO: 81. In some aspects, the nucleic acid sequence comprises atleast 80% sequence identity with SEQ ID NO: 82-SEQ ID NO: 162. In someaspects, the nucleotide sequence encoding for the sequence comprises anyone of SEQ ID NO: 82-SEQ ID NO: 162.

In some aspects, the repression domain comprises KRAB, Sin3a, LSD1,SUV39H1, G9A (EHMT2), DNMT1, DNMT3A-DNMT3L, DNMT3B, KOX,TGF-beta-inducible early gene (TIEG), v-erbA, SID, MBD2, MBD3, Rb, orMeCP2. In some aspects, the plurality of repeat units comprises 3 to 60repeat units.

In some aspects, the target site is a nucleic acid sequence within aPDCD1 gene, a CTLA4 gene, a LAGS gene, a TET2 gene, a BTLA gene, aHAVCR2 gene, a CCR5 gene, a CXCR4 gene, a TRA gene, a TRB gene, a B2Mgene, an albumin gene, a HBB gene, a HBA1 gene, a TTR gene, a NR3C1gene, a CD52 gene, an erythroid specific enhancer of the BCL11A gene, aCBLB gene, a TGFBR1 gene, a SERPINA1 gene, a HBV genomic DNA in infectedcells, a CEP290 gene, a DMD gene, a CFTR gene, or an IL2RG gene.

In other aspects, a nucleic acid sequence encoding a chimeric antigenreceptor (CAR), alpha-L iduronidase (IDUA), iduronate-2-sulfatase (IDS),or Factor 9 (F9), is inserted at the target site.

In various aspects, the present disclosure provides a method of genomeediting, the method comprising: administering any of the abovecompositions and inducing a double stranded break.

Also provided herein is a non-naturally occurring DNA bindingpolypeptide that includes from N- to C-terminus: a N-terminus regioncomprising at least residues N+110 to N+1 of a TALE protein, where theN-terminus region does not include residues N+288 to N+116 of the TALEprotein; a plurality of TALE repeat units derived from a TALE protein;and C-terminus region of a TALE protein. The N-terminus region may notinclude at least amino acids N+288 to N+116 of the TALE protein. TheN-terminus region may not include amino acids N+288 to up to N+116 ofthe TALE protein. The N-terminus region may not include at least aminoacids N+288 to up to N+111 of the TALE protein. The N-terminus regionmay include residues N+1 to up to N+115 of the TALE protein. TheN-terminus region may include residues N+1 to up to N+110 of the TALEprotein. The C-terminus region may include full length C-terminus regionof a TALE protein or a fragment thereof, e.g., residues C+1 to C+63 ofthe TALE protein. The DNA binding polypeptide may be fused to aheterologous functional domain, such as, enzyme, a transcriptionalactivator, a transcriptional repressor, or a DNA nucleotide modifier.The N-terminus region, the TALE repeat units, and the C-terminus regionmay be derived from the same TALE protein or from different TALEproteins. The TALE proteins from which the N-terminus region, the TALErepeat units, and the C-terminus region may be derived includeXanthomonas TALE proteins, such as, AvrBs3, AVRHAH1, AvrXa7, AVRB6, orAvrXa10.

In various aspects, the present disclosure provides a method of genomeediting, the method comprising: administering any of the abovepolypeptides or compositions thereof and inducing a double strandedbreak.

In various aspects, the present disclosure provides method of generepression, the method comprising administering any of the abovepolypeptides or compositions thereof and repressing gene expression.

INCORPORATION BY REFERENCE

All publications, patents, and patent applications mentioned in thisspecification are herein incorporated by reference to the same extent asif each individual publication, patent, or patent application wasspecifically and individually indicated to be incorporated by reference.

BRIEF DESCRIPTION OF DRAWINGS

FIGS. 1A-1C show schematics of the domain structure of DNA bindingproteins (not drawn to scale).

FIG. 2 shows nuclease activity mediated by DNA binding protein dimersthat each include from N-terminus to C-terminus: a N-terminus region ofa TALE protein, TALE repeat units, C-terminus region of a TALE protein,and a Fok1 endonuclease.

DETAILED DESCRIPTION

The present disclosure provides compositions and methods for genomeediting and gene regulation (including activation and repression) withDNA binding domains fused to functional domains via linkers that serveas recognition sites for further activity (e.g., a non-nuclease enzymeactivity). The present disclosure also provides compositions and methodsfor genome editing and gene regulation with DNA binding domains that canhave enhanced binding to a target nucleic acid sequence. Enhancedbinding to a target nucleic acid sequence can be achieved with the DNAbinding domains of the present disclosure in which repeat units can bevaried in length to tune for binding activity.

Linkers Comprising Recognition Sites

In some embodiments, the present disclosure provides DNA binding domainswith gapped repeat units for use as gene editing complexes. A DNAbinding domain with gapped repeat can comprise of a plurality of repeatunits in which each repeat unit of the plurality of repeat units isseparated from a neighboring repeat unit by a linker. This linker cancomprise a recognition site for additional functionality and activity.For example, the linker can comprise a recognition site for a smallmolecule. As another example, the linker can serve as a recognition sitefor a protease. In yet another example, the linker can serve as arecognition site for a kinase. In other embodiments, the recognitionsite can serve as a localization signal.

Each repeat unit of a DNA binding domain (e.g., RNBDs, MAP-NBDs, TALEs)comprises a secondary structure in which the RVD interfaces with andbinds to a target nucleic acid base on double stranded DNA, while theremainder of the repeat unit protrudes from the surface of the DNA.Thus, the linkers comprising a recognition site between each repeat unitare removed from the surface of the DNA and are solvent accessible. Insome embodiments, these solvent accessible linkers comprisingrecognition sites can have extra activity while mediating gene editing.

Examples of a left and a right DNA binding domain comprising repeatunits derived from Xanthomonas spp. are shown below in TABLE 1 for AAVS1and GA7. “X,” shown in bold and underlining, represents a linkercomprising a recognition site and can comprise 1-40 amino acid residues.An amino acid residue of the linker can comprise a glycine, an alanine,a threonine, or a histidine.

In some embodiments, “derived” indicates that a protein is from aparticular source (e.g., Ralstonia), is a variant of a protein from aparticular source (e.g., Ralstonia), is a mutated or modified form ofthe protein from a particular source (e.g., Ralstonia), and shares atleast 30% sequence identity with, at least 40% sequence identity with,at least 50% sequence identity with, at least 60% sequence identitywith, at least 70% sequence identity with, at least 80% sequenceidentity with, or at least 90% sequence identity with a protein from aparticular source (e.g., Ralstonia, Xanthomonas, or Legionella).

TABLE 1 Exemplary Left or Right Gapped DNA Binding Domains SEQ ID NOConstruct Sequence SEQ ID NO: 307 AAVS1_LeftLTPDQVVAIASHDGGKQALETVQRLLPVLCQDHG X LTPDQVV AIASHDGGKQALETVQRLLPVLCQDHGX LTPDQVVAIASHDG GKQALETVQRLLPVLCQDHG X LTPDQVVAIASHDGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASNGGGKQALETVQRLLPV LCQDHG XLTPDQVVAIASHDGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASHDGGKQALETVQRLLPVLCQDHGX LTPDQVV AIASNIGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASHDGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASHDGGKQALETV QRLLPVLCQDHG XLTPDQVVAIASHDGGKQALETVQRLLPVL CQDHG X LTPDQVVAIASHDGGKQALETVQRLLPVLCQDHGX L TPDQVVAIASNIGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASHDGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASNIGGK QALETVQRLLPVLCQDHG XLTPDQVVAIASNHGGKQALETVQ RLLPVLCQDHGXLTPDQVVAIASNGGG SEQ ID NO: 308AAVS1_Right LTPDQVVAIASNGGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASNGGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASNGG GKQALETVQRLLPVLCQDHG XLTPDQVVAIASHDGGKQALET VQRLLPVLCQDHG X LTPDQVVAIASNGGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASNHGGKQALETVQRLLPVLCQDHG XLTPDQVVAIASNGGGKQALETVQRLLPVLCQDHG X LTPDQVV AIASHDGGKQALETVQRLLPVLCQDHGX LTPDQVVAIASNIGG KQALETVQRLLPVLCQDHG X LTPDQVVAIASHDGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASHDGGKQALETVQRLLPVL CQDHG XLTPDQVVAIASNIGGKQALETVQRLLPVLCQDHG X L TPDQVVAIASNIGGKQALETVQRLLPVLCQDHGX LTPDQVVAI ASNGGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASHDGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASHDGGKQALETVQ RLLPVLCQDHG XLTPDQVVAIASNGGGKQALESIVAQLSRPDP ALA SEQ ID NO: 309 GA7.2 LeftLTPDQVVAIASNHGGKQALETVQRLLPVLCQDHG X LTPDQVV AIASHDGGKQALETVQRLLPVLCQDHGX LTPDQVVAIASNGG GKQALETVQRLLPVLCQDHG X LTPDQVVAIASHDGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASNIGGKQALETVQRLLPVL CQDHG XLTPDQVVAIASNHGGKQALETVQRLLPVLCQDHG X L TPDQVVAIASHDGGKQALETVQRLLPVLCQDHGX LTPDQVVA IASHDGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASHDGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASNIGGKQALETV QRLLPVLCQDHG XLTPDQVVAIASNHGGKQALETVQRLLPVL CQDHG X LTPDQVVAIASHDGGKQALETVQRLLPVLCQDHGX L TPDQVVAIASNGGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASHDGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASNIGGK QALETVQRLLPVLCQDHG XLTPDQVVAIASNHGGKQALETVQ RLLPVLCQDHG X LTPDQVVAIASHDGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASHDGGKQALETVQRLLPVLCQDHG X LT PDQVVAIASNGGGKSEQ ID NO: 310 GA7.2 Right LTPDQVVAIASHDGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASHDGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASHDG GKQALETVQRLLPVLCQDHG XLTPDQVVAIASHDGGKQALET VQRLLPVLCQDHG X LTPDQVVAIASHDGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASNGGGKQALETVQRLLPVLCQDHG XLTPDQVVAIASHDGGKQALETVQRLLPVLCQDHG X LTPDQVV AIASNGGGKQALETVQRLLPVLCQDHGX LTPDQVVAIASHDG GKQALETVQRLLPVLCQDHG X LTPDQVVAIASNIGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASNGGGKQALETVQRLLPV LCQDHG XLTPDQVVAIASNGGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASHDGGKQALETVQRLLPVLCQDHGX LTPDQVV AIASNGGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASHDGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASNGGGKQALET VQRLLPVLCQDHG XLTPDQVVAIASNIGGKQALETVQRLLPVL CQDHG X LTPDQVVAIASHDGGKQALETVQRLLPVLCQDHGX L TPDQVVAIASHDGGKQALETVQRLLPVLCQDHG X LTPDQVVAIASNIGGKQALETVQRLLPVLCQDHG X LTPDQVVASASNGGG KQALESIVAQLSRPDPALA

Tunable Repeat Units

In some embodiments, the present disclosure provides DNA binding domains(e.g., RNBDs, MAP-NBDs, TALEs) with expanded repeat units. For example,a DNA binding domain (e.g., RNBDs, MAP-NBDs, TALEs) comprises aplurality of repeat units in which each repeat unit is usually 33-35amino acid residues in length. The present disclosure provides repeatunits, which are greater than 35 amino acid residues in length. In someembodiments, the present disclosure provides repeat units, which aregreater than 39 amino acid residues in length. In some embodiments, thepresent disclosure provides repeat units which are 35 to 40 amino acidresidues long, 39 to 40 amino acid residues long, 35 to 45 amino acidresidues long, 39 to 45 amino acid residues long, 35 to 50 amino acidresidues long, 39 to 50 amino acid residues long, 35 to 50 amino acidresidues long, 35 to 60 amino acid residues long, 39 to 60 amino acidresidues long, 35 to 70 amino acid residues long, 39 to 70 amino acidresidues long, 35 to 79 amino acid residues long, or 39 to 79 amino acidresidues long.

In other embodiments, the present disclosure provides DNA bindingdomains (e.g., RNBDs, MAP-NBDs, TALEs) with contracted repeat units. Forexample, the present disclosure provides repeat units, which are lessthan 32 amino acid residues in length. In some embodiments, the presentdisclosure provides repeat units, which are 15 to 32 amino acid residuesin length, 16 to 32 amino acid residues in length, 17 to 32 amino acidresidues in length, 18 to 32 amino acid residues in length, 19 to 32amino acid residues in length, 20 to 32 amino acid residues in length,21 to 32 amino acid residues in length, 22 to 32 amino acid residues inlength, 23 to 32 amino acid residues in length, 24 to 32 amino acidresidues in length, 25 to 32 amino acid residues in length, 26 to 32amino acid residues in length, 27 to 32 amino acid residues in length,28 to 32 amino acid residues in length, 29 to 32 amino acid residues inlength, 30 to 32 amino acid residues in length, or 31 to 32 amino acidresidues in length.

In some embodiments, said expanded repeat units can be tuned to modulatebinding of each repeat unit to its target nucleic acid, resulting in theability to overall modulate binding of the DNA binding domain to atarget gene of interest. For example, expanding repeat units can improvebinding affinity of the repeat unit to its target nucleic acid base andthereby increase binding affinity of the DNA binding domain to a targetgene. In some embodiments, expanding repeat units can improvespecificity of the DNA binding domain for a target gene. In otherembodiments, contracting repeat units can improve binding affinity ofthe repeat unit to its target nucleic acid base and thereby increasebinding affinity of the DNA binding domain for a target gene.

Described in further detail below are DNA binding domains from the genusof Ralstonia, the genus of animal pathogens (e.g., Legionella,Burkholderia, Paraburkholderia, or Francisella), and the genus ofXanthomonas, which can comprise linkers comprising recognition sites,expanded repeat units, or contracted repeat units, as described indetail above.

In some embodiments, the present disclosure provides a compositioncomprising a modular nucleic acid binding domain comprising a pluralityof repeat units, wherein a repeat unit of the plurality of repeat unitsrecognizes a target nucleic acid base and wherein the plurality ofrepeat units has one or more of the following characteristics: (a) atleast one repeat unit comprising greater than 39 amino acid residues;(b) at least one repeat unit comprising greater than 35 amino acidresidues derived from the genus of Ralstonia; (c) at least one repeatunit comprising less than 32 amino acid residues; and (d) each repeatunit of the plurality of repeat units is separated from a neighboringrepeat unit by a linker comprising a recognition site.

In some embodiments, the at least one repeat unit comprises an aminoacid selected from glycine, alanine, threonine or histidine at aposition after an amino acid residue at position 35. In someembodiments, the at least one repeat unit comprises an amino acidselected from glycine, alanine, threonine or histidine at a positionafter an amino acid residue at position 39. In some aspects, therecognition site is for a small molecule, a protease, or a kinase. Insome aspects, the recognition site serves as a localization signal.

Ralstonia-Derived DNA Binding Domains

The present disclosure provides modular nucleic acid binding domains(NBDs) derived from the genus of bacteria. For example, in someembodiments, the present disclosure provides NBDs derived from bacteriathat serve as plant pathogens, such as from the genus of Xanthomonasspp. and Ralstonia. In particular embodiments, the present disclosureprovides NBDs from the genus of Ralstonia. Also provided herein are NBDsfrom the animal pathogen, Legionella, Provided herein are sequences ofrepeat units derived from the genus of Ralstonia, which can be linkedtogether to form non-naturally occurring modular nucleic acid bindingdomains (NBDs), capable of targeting and binding any target nucleic acidsequence (e.g., DNA sequence).

In some embodiments, “modular” indicates that a particular compositionsuch as a nucleic acid binding domain, comprises a plurality of repeatunits that can be switched and replaced with other repeat units. Forexample, any repeat unit in a modular nucleic acid binding domain can beswitched with a different repeat unit. In some embodiments, modularityof the nucleic acid binding domains disclosed herein allows forswitching the target nucleic acid base for a particular repeat unit bysimply switching it out for another repeat unit. In some embodiments,modularity of the nucleic acid binding domains disclosed herein allowsfor swapping out a particular repeat unit for another repeat unit toincrease the affinity of the repeat unit for a particular target nucleicacid. Overall, the modular nature of the nucleic acid binding domainsdisclosed herein enables the development of genome editing complexesthat can precisely target any nucleic acid sequence of interest.

In particular embodiments, modular nucleic acid binding domains (NBDs),also referred to herein as “DNA binding polypeptides,” are providedherein from the genus of Ralstonia solanacearum. In some embodiments,modular nucleic acid binding domains derived from Ralstonia (RNBDs) canbe engineered to bind to a target gene of interest for purposes of geneediting or gene regulation. An RNBD can be engineered to target and binda specific nucleic acid sequence. The nucleic acid sequence can be DNAor RNA.

In some embodiments, the RNBD can comprise a plurality of repeat units,wherein each repeat unit recognizes and binds to a single nucleotide (inDNA or RNA) or base pair. Each repeat unit in the plurality of repeatunits can be specifically selected to target and bind to a specificnucleic acid sequence, thus contributing to the modular nature of theDNA binding polypeptide. A non-naturally occurring Ralstonia-derivedmodular nucleic acid binding domain can comprise a plurality of repeatunits, wherein each repeat unit of the plurality of repeat unitsrecognizes a single target nucleotide, base pair, or both.

In some embodiments, the repeat unit of a modular nucleic acid bindingdomain can be derived from a bacterial protein. For example, thebacterial protein can be a transcription activator like effector-likeprotein (TALE-like protein). The bacterial protein can be derived fromRalstonia solanacearum. Repeat units derived from Ralstonia solanacearumcan be 33-35 amino acid residues in length. In some embodiments, therepeat can be derived from the naturally occurring Ralstoniasolanacearum TALE-like protein.

TABLE 2 below shows exemplary repeat units derived from the genus ofRalstonia, which are capable of binding a target nucleic acid.

TABLE 2 Exemplary Ralstonia-derived Repeat Units SEQ ID NO SequenceSEQ ID NO: 168 LDTEQVVAIASHNGGKQALEAVKADLLDLLGAPYV SEQ ID NO: 169LDTEQVVAIASHNGGKQALEAVKADLLDLRGAPYA SEQ ID NO: 170LDTEQVVAIASHNGGKQALEAVKADLLELRGAPYA SEQ ID NO: 171LDTEQVVAIASHNGGKQALEAVKAHLLDLRGAPYA SEQ ID NO: 172LNTEQVVAIASHNGGKQALEAVKADLLDLRGAPYA SEQ ID NO: 173LNTEQVVAIASNNGGKQALEAVKTHLLDLRGARYA SEQ ID NO: 174LNTEQVVAIASNPGGKQALEAVRALFPDLRAAPYA SEQ ID NO: 175LNTEQVVAIASSHGGKQALEAVRALFPDLRAAPYA SEQ ID NO: 176LNTEQVVAVASNKGGKQALEAVGAQLLALRAVPYA SEQ ID NO: 177LNTEQVVAVASNKGGKQALEAVGAQLLALRAVPYE SEQ ID NO: 178LSAAQVVAIASHDGGKQALEAVGTQLVALRAAPYA SEQ ID NO: 179LSIAQVVAVASRSGGKQALEAVRAQLLALRAAPYG SEQ ID NO: 180LSPEQVVAIASNHGGKQALEAVRALFRGLRAAPYG SEQ ID NO: 181LSPEQVVAIASNNGGKQALEAVKAQLLELRAAPYE SEQ ID NO: 182LSTAQLVAIASNPGGKQALEAIRALFRELRAAPYA SEQ ID NO: 183LSTAQLVAIASNPGGKQALEAVRALFRELRAAPYA SEQ ID NO: 184LSTAQLVAIASNPGGKQALEAVRAPFREVRAAPYA SEQ ID NO: 185LSTAQLVSIASNPGGKQALEAVRALFRELRAAPYA SEQ ID NO: 186LSTAQVAAIASHDGGKQALEAVGTQLVVLRAAPYA SEQ ID NO: 187LSTAQVATIASSIGGRQALEALKVQLPVLRAAPYG SEQ ID NO: 188LSTAQVATIASSIGGRQALEAVKVQLPVLRAAPYG SEQ ID NO: 189LSTAQVVAIAANNGGKQALEAVRALLPVLRVAPYE SEQ ID NO: 190LSTAQVVAIAGNGGGKQALEGIGEQLLKLRTAPYG SEQ ID NO: 191LSTAQVVAIASHDGGKQALEAAGTQLVALRAAPYA SEQ ID NO: 192LSTAQVVAIASHDGGKQALEAVGAQLVELRAAPYA SEQ ID NO: 193LSTAQVVAIASHDGGKQALEAVGTQLVALRAAPYA SEQ ID NO: 194LSTAQVVAIASHDGGNQALEAVGTQLVALRAAPYA SEQ ID NO: 195LSTAQVVAIASHNGGKQALEAVKAQLLDLRGAPYA SEQ ID NO: 196LSTAQVVAIASNDGGKQALEEVEAQLLALRAAPYE SEQ ID NO: 197LSTAQVVAIASNGGGKQALEGIGEQLLKLRTAPYG SEQ ID NO: 198LSTAQVVAIASNGGGKQALEGIGEQLRKLRTAPYG SEQ ID NO: 199LSTAQVVAIASNPGGKQALEAVRALFRELRAAPYA SEQ ID NO: 200LSTAQVVAIASQNGGKQALEAVKAQLLDLRGAPYA SEQ ID NO: 201LSTAQVVAIASSHGGKQALEAVRALFRELRAAPYG SEQ ID NO: 202LSTAQVVAIASSNGGKQALEAVWALLPVLRATPYD SEQ ID NO: 203LSTAQVVAIATRSGGKQALEAVRAQLLDLRAAPYG SEQ ID NO: 204LSTAQVVAVAGRNGGKQALEAVRAQLPALRAAPYG SEQ ID NO: 205LSTAQVVAVASSNGGKQALEAVWALLPVLRATPYD SEQ ID NO: 206LSTAQVVTIASSNGGKQALEAVWALLPVLRATPYD SEQ ID NO: 207LSTEQVVAIAGHDGGKQALEAVGAQLVALRAAPYA SEQ ID NO: 208LSTEQVVAIASHDGGKQALEAVGAQLVALLAAPYA SEQ ID NO: 209LSTEQVVAIASHDGGKQALEAVGAQLVALRAAPYA SEQ ID NO: 210LSTEQVVAIASHDGGKQALEAVGGQLVALRAAPYA SEQ ID NO: 211LSTEQVVAIASHDGGKQALEAVGTQLVALRAAPYA SEQ ID NO: 212LSTEQVVAIASHDGGKQALEAVGVQLVALRAAPYA SEQ ID NO: 213LSTEQVVAIASHDGGKQALEAVVAQLVALRAAPYA SEQ ID NO: 214LSTEQVVAIASHDGGKQPLEAVGAQLVALRAAPYA SEQ ID NO: 215LSTEQVVAIASHGGGKQVLEGIGEQLLKLRAAPYG SEQ ID NO: 216LSTEQVVAIASHKGGKQALEGIGEQLLKLRAAPYG SEQ ID NO: 217LSTEQVVAIASHNGGKQALEAVKADLLDLRGAPYA SEQ ID NO: 218LSTEQVVAIASHNGGKQALEAVKADLLELRGAPYA SEQ ID NO: 219LSTEQVVAIASHNGGKQALEAVKAHLLDLRGAPYA SEQ ID NO: 220LSTEQVVAIASHNGGKQALEAVKAHLLDLRGVPYA SEQ ID NO: 221LSTEQVVAIASHNGGKQALEAVKAHLLELRGAPYA SEQ ID NO: 222LSTEQVVAIASHNGGKQALEAVKAQLLDLRGAPYA SEQ ID NO: 223LSTEQVVAIASHNGGKQALEAVKAQLLELRGAPYA SEQ ID NO: 224LSTEQVVAIASHNGGKQALEAVKAQLPVLRRAPYG SEQ ID NO: 225LSTEQVVAIASHNGGKQALEAVKTQLLELRGAPYA SEQ ID NO: 226LSTEQVVAIASHNGGKQALEAVRAQLPALRAAPYG SEQ ID NO: 227LSTEQVVAIASHNGSKQALEAVKAQLLDLRGAPYA SEQ ID NO: 228LSTEQVVAIASNGGGKQALEGIGKQLQELRAAPHG SEQ ID NO: 229LSTEQVVAIASNGGGKQALEGIGKQLQELRAAPYG SEQ ID NO: 230LSTEQVVAIASNHGGKQALEAVRALFRELRAAPYA SEQ ID NO: 231LSTEQVVAIASNHGGKQALEAVRALFRGLRAAPYG SEQ ID NO: 232LSTEQVVAIASNKGGKQALEAVKADLLDLRGAPYV SEQ ID NO: 233LSTEQVVAIASNKGGKQALEAVKAHLLDLLGAPYV SEQ ID NO: 234LSTEQVVAIASNKGGKQALEAVKAQLLALRAAPYA SEQ ID NO: 235LSTEQVVAIASNKGGKQALEAVKAQLLELRGAPYA SEQ ID NO: 236LSTEQVVAIASNNGGKQALEAVKALLLELRAAPYE SEQ ID NO: 237LSTEQVVAIASNNGGKQALEAVKAQLLALRAAPYE SEQ ID NO: 238LSTEQVVAIASNNGGKQALEAVKAQLLDLRGAPYA SEQ ID NO: 239LSTEQVVAIASNNGGKQALEAVKAQLLVLRAAPYG SEQ ID NO: 240LSTEQVVAIASNNGGKQALEAVKAQLPALRAAPYE SEQ ID NO: 241LSTEQVVAIASNNGGKQALEAVKAQLPVLRRAPCG SEQ ID NO: 242LSTEQVVAIASNNGGKQALEAVKAQLPVLRRAPYG SEQ ID NO: 243LSTEQVVAIASNNGGKQALEAVKARLLDLRGAPYA SEQ ID NO: 244LSTEQVVAIASNNGGKQALEAVKTQLLALRTAPYE SEQ ID NO: 245LSTEQVVAIASNPGGKQALEAVRALFPDLRAAPYA SEQ ID NO: 246LSTEQVVAIASSHGGKQALEAVRALFPDLRAAPYA SEQ ID NO: 247LSTEQVVAIASSHGGKQALEAVRALLPVLRATPYD SEQ ID NO: 248LSTEQVVAVASHNGGKQALEAVRAQLLDLRAAPYE SEQ ID NO: 249LSTEQVVAVASNKGGKQALAAVEAQLLRLRAAPYE SEQ ID NO: 250LSTEQVVAVASNKGGKQALEEVEAQLLRLRAAPYE SEQ ID NO: 251LSTEQVVAVASNKGGKQVLEAVGAQLLALRAVPYE SEQ ID NO: 252LSTEQVVAVASNNGGKQALKAVKAQLLALRAAPYE SEQ ID NO: 253LSTEQVVVIANSIGGKQALEAVKVQLPVLRAAPYE SEQ ID NO: 254LSTGQVVAIASNGGGRQALEAVREQLLALRAVPYE SEQ ID NO: 255LSVAQVVTIASHNGGKQALEAVRAQLLALRAAPYG SEQ ID NO: 256LTIAQVVAVASHNGGKQALEAIGAQLLALRAAPYA SEQ ID NO: 257LTIAQVVAVASHNGGKQALEVIGAQLLALRAAPYA SEQ ID NO: 258LTPQQVVAIAANTGGKQALGAITTQLPILRAAPYE SEQ ID NO: 259LTPQQVVAIASNTGGKQALEAVTVQLRVLRGARYG SEQ ID NO: 260LTPQQVVAIASNTGGKRALEAVCVQLPVLRAAPYR SEQ ID NO: 261LTPQQVVAIASNTGGKRALEAVRVQLPVLRAAPYE SEQ ID NO: 262LTTAQVVAIASNDGGKQALEAVGAQLLVLRAVPYE SEQ ID NO: 263LTTAQVVAIASNDGGKQTLEVAGAQLLALRAVPYE SEQ ID NO: 336LSTAQVVAVASGSGGKPALEAVRAQLLALRAAPYG SEQ ID NO: 337LSTAQVVAVASGSGGKPALEAVRAQLLALRAAPYG SEQ ID NO: 338LNTAQIVAIASHDGGKPALEAVWAKLPVLRGAPYA SEQ ID NO: 339LNTAQVVAIASHDGGKPALEAVRAKLPVLRGVPYA SEQ ID NO: 340LNTAQVVAIASHDGGKPALEAVWAKLPVLRGVPYA SEQ ID NO: 341LNTAQVVAIASHDGGKPALEAVWAKLPVLRGVPYE SEQ ID NO: 342LSTAQVVAIASHDGGKPALEAVWAKLPVLRGAPYA SEQ ID NO: 343LSTAQVVAVASHDGGKPALEAVRKQLPVLRGVPHQ SEQ ID NO: 344LSTAQVVAVASHDGGKPALEAVRKQLPVLRGVPHQ SEQ ID NO: 345LNTAQVVAIASHDGGKPALEAVWAKLPVLRGVPYA SEQ ID NO: 346LSTEQVVAIASHNGGKLALEAVKAHLLDLRGAPYA SEQ ID NO: 347LSTEQVVAIASHNGGKPALEAVKAHLLALRAAPYA SEQ ID NO: 348LNTAQVVAIASHYGGKPALEAVWAKLPVLRGVPYA SEQ ID NO: 349LNTEQVVAIASNNGGKPALEAVKAQLLELRAAPYE SEQ ID NO: 350LSPEQVVAIASNNGGKPALEAVKALLLALRAAPYE SEQ ID NO: 351LSPEQVVAIASNNGGKPALEAVKAQLLELRAAPYE SEQ ID NO: 352LSTEQVVAIASNNGGKPALEAVKALLLALRAAPYE SEQ ID NO: 353LSTEQVVAIASNNGGKPALEAVKALLLELRAAPYE SEQ ID NO: 354LSPEQVVAIASNNGGKPALEAVKALLLALRAAPYE SEQ ID NO: 355LSPEQVVAIASNNGGKPALEAVKAQLLELRAAPYE SEQ ID NO: 356LSTEQVVAIASNNGGKPALEAVKALLLELRAAPYE

In some embodiments, an RNBD of the present disclosure can comprisebetween 1 to 50 Ralstonia solanacearum-derived repeat units. In someembodiments, an RNBD of the present disclosure can comprise between 9and 36 Ralstonia solanacearum-derived repeat units. Preferably, in someembodiments, an RNBD of the present disclosure can comprise between 12and 30 Ralstonia solanacearum-derived repeat units. An RNBD describedherein can comprise between 5 to 10 Ralstonia solanacearum-derivedrepeat units, between 10 to 15 Ralstonia solanacearum-derived repeatunits, between 15 to 20 Ralstonia solanacearum-derived repeat units,between 20 to 25 Ralstonia solanacearum-derived repeat units, between 25to 30 Ralstonia solanacearum-derived repeat units, or between 30 to 35Ralstonia solanacearum-derived repeat units, between 35 to 40 Ralstoniasolanacearum-derived repeat units. An RNBD described herein can comprise5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, or 40Ralstonia solanacearum-derived repeat units.

A Ralstonia solanacearum-derived repeat unit can be derived from awild-type repeat unit, such as any one of SEQ ID NO: 168-SEQ ID NO: 263or SEQ ID NO: 336-SEQ ID NO: 356. A Ralstonia solanacearum—repeat unitcan have at least 80% sequence identity with any one of SEQ ID NO:168-SEQ ID NO: 263 or SEQ ID NO: 336-SEQ ID NO: 356. A Ralstoniasolanacearum-derived repeat unit can also comprise a modified Ralstoniasolanacearum-derived repeat unit enhanced for specific recognition of anucleotide or base pair. An RNBD described herein can comprise one ormore wild-type Ralstonia solanacearum-derived repeat units, one or moremodified Ralstonia solanacearum-derived repeat units, or a combinationthereof. In some embodiments, a modified Ralstonia solanacearum-derivedrepeat unit can comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, or 29 mutationsthat can enhance recognition of a specific nucleotide or base pair. Insome embodiments, a modified Ralstonia solanacearum-derived repeat unitcan comprise more than 1 modification, for example 1 to 5 modifications,5 to 10 modifications, 10 to 15 modifications, 15 to 20 modifications,20 to 25 modification, or 25-29 modifications. In some embodiments, AnRNBD can comprise more than one modified Ralstonia solanacearum-derivedrepeat units, wherein each of the modified Ralstoniasolanacearum-derived repeat units can have a different number ofmodifications.

The Ralstonia solanacearum-derived repeat units comprise amino acidresidues at positions 12 and 13, what is referred to herein as, a repeatvariable diresidue (RVD). The RVD can modulate binding affinity of therepeat unit for a particular nucleic acid base (e.g., adenosine,guanine, cytosine, thymidine, or uracil (in RNA sequences)). In someembodiments, a single amino acid residue can modulate binding to thetarget nucleic acid base. In some embodiments, two amino acid residues(RVD) can modulate binding to the target nucleic acid base. In someembodiments, any repeat unit disclosed herein can have an RVD selectedfrom HD, HG, HK, HN, ND, NG, NH, NK, NN, NP, NT, QN, RN, RS, SH, SI, orSN. In some embodiments, an RVD of HD can bind to cytosine. In someembodiments, an RVD of NG can bind to thymidine. In some embodiments, anRVD of NK can bind to guanine. In some embodiments, an RVD of SI canbind to adenosine. In some embodiments, an RVD of RS can bind toadenosine. In some embodiments, an RVD of HN can bind to guanine. Insome embodiments, an RVD of NT can bind to adenosine.

In some embodiments, a repeat unit having at least 80% sequence identitywith SEQ ID NO: 209 can be included in a DNA binding domain of thepresent disclosure to bind to cytosine. In some embodiments, a repeatunit having at least 80% sequence identity with SEQ ID NO: 197 can beincluded in a DNA binding domain of the present disclosure to bind tothymidine. In some embodiments, a repeat unit having at least 80%sequence identity with SEQ ID NO: 233 can be included in a DNA bindingdomain of the present disclosure to bind to guanine. In someembodiments, a repeat unit having at least 80% sequence identity withSEQ ID NO: 253 can be included in a DNA binding domain of the presentdisclosure to bind to adenosine. In some embodiments, a repeat unithaving at least 80% sequence identity with SEQ ID NO: 203 can beincluded in a DNA binding domain of the present disclosure to bind toadenosine. In some embodiments, a repeat unit having at least 80%sequence identity with SEQ ID NO: 218 can be included in a DNA bindingdomain of the present disclosure to bind to guanine. In someembodiments, the repeat unit of SEQ ID NO: 209 can be included in a DNAbinding domain of the present disclosure to bind to cytosine. In someembodiments, the repeat unit of SEQ ID NO: 197 can be included in a DNAbinding domain of the present disclosure to bind to thymidine. In someembodiments, the repeat unit of SEQ ID NO: 233 can be included in a DNAbinding domain of the present disclosure to bind to guanine. In someembodiments, the repeat unit of SEQ ID NO: 253 can be included in a DNAbinding domain of the present disclosure to bind to adenosine. In someembodiments, the repeat unit of SEQ ID NO: 203 can be included in a DNAbinding domain of the present disclosure to bind to adenosine. In someembodiments, the repeat unit of SEQ ID NO: 218 can be included in a DNAbinding domain of the present disclosure to bind to guanine.

In some embodiments, the present disclosure provides repeat units as setforth in SEQ ID NO: 267-SEQ ID NO: SEQ ID NO: 279. Unspecified aminoacid residues in SEQ ID NO: 267-SEQ ID NO: SEQ ID NO: 279 can be anyamino acid residues. In particular embodiments, unspecified amino acidresidues in SEQ ID NO: 267-SEQ ID NO: SEQ ID NO: 279 can be those setforth in the Variable Definition column of TABLE 3.

TABLE 3 shows consensus sequences of Ralstonia-derived repeat units.

TABLE 3 Consensus Sequences of Ralstonia-derived Repeat Units RVDConsensus Sequence Variable Definition HNLX₁X₂X₃QVVX₄X₅ASHNGX₆KQALEX₇X₈X₉X₁₀X₁₁LX₁₂X₁₃LX₁₄X₁₅X₁₆PYX₁₇ X₁:D|N|S|T, X₂: I|T|V, X₃: A|E, X₄: A|T, X₅: (SEQ ID NO: 267) I|V, X₆: G|S,X₇: A|V, X₈: I|V, X₉: G|K|R, X₁₀: A|T, X₁₁: D|H|Q, X₁₂: L|P, X₁₃:A|D|E|V, X₁₄: L|R, X₁₅: A|G|R, X₁₆: A|V, X₁₇: A|E|G|V NNLX₁X₂X₃QVVAX₄AX₅NNGGKQALX₆AVX₇X₈X₉LX₁₀X₁₁LRX₁₂AX₁₃X₁₄X₁₅ X₁: N|S, X₂:P|T, X₃: A|E, X₄: I|V, X₅: A|S, X₆: (SEQ ID NO: 268) E|K, X₇: K|R, X₈:A|T, X₉: H|L|Q|R, X₁₀: L|P, X₁₁: A|D|E|V, X₁₂: A|G|R|T|V, X₁₃: P|R, X₁₄:C|Y, X₁₅: A|E|G NP LX₁TX₂QX₃VX₄IASNPGGKQALEAX₅RAX₆FX₇X₈X₉RAAPYA X₁: N|S,X₂: A|E, X₃: L|V, X₄: A|S, X₅: I|V, X₆: (SEQ ID NO: 269) L|P, X₇: P|R,X₈: D|E, X₉: L|V SH LX₁TX₂QVVAIASSHGGKQALEAVRALX₃X₄X₅LRAX₆PYX₇ X₁: N|S,X₂: A|E, X₃: F|L, X₄: P|R, X₅: D|E|V, (SEQ ID NO: 270) X₆: A|T, X₇:A|D|G NK LX₁TEQVVAX₂ASNKGGKQX₃LX₄X₅VX₆AX₇LLX₈LX₉X₁₀X₁₁PYX₁₂ X₁: N|S,X₁₀: A|G, X₁₁: A|V, X₁₂: A|E|V, X₂: (SEQ ID NO: 271) I|V, X₃: A|V, X₄:A|E, X₅: A|E, X₆: E|G|K, X₇: D|H|Q, X₈: A|D|E|R, X₉: L|R HDLSX₁X₂QVX₃AIAX₄HDGGX₅QX₆LEAX₇X₈X₉QLVX₁₀LX₁₁AAPYA X₁: A|T, X₂: A|E, X₃:A|V, X₄: G|S, X₅: K|N, (SEQ ID NO: 272) X₆: A|P, X₇: A|V, X₈: G|V, X₉:A|G|T|V, X₁₀: A|E|V, X₁₁: L|R RS LSX₁AQVVAX₂AX₃RSGGKQALEAVRAQLLX₄LRAAPYGX₁: I|T, X₂: I|V, X₃: S|T, X₄: A|D (SEQ ID NO: 273) NHLSX₁EQVVAIASNHGGKQALEAVRALFRX₂LRAAPYX₃ X₁: P|T, X₂: E|G, X₃: A|G (SEQ IDNO: 274) SI LSTX₁QVX₂X₃IAX₄SIGGX₅QALEAX₆KVQLPVLRAAPYX₇ X₁: A|E, X₂: A|V,X₃: T|V, X₄: N|S, X₅: K|R, (SEQ ID NO: 275) X₆: L|V, X₇: E|G NDLX₁TAQVVAIASNDGGKQX₂LEX₃X₄X₅AQLLX₆LRAX₇PYE X₁: S|T, X₂: A|T, X₃: A|E|V,X₄: A|V, X₅: E|G, (SEQ ID NO: 276) X₆: A|V, X₇: A|V SNLSTAQVVX₁X₂ASSNGGKQALEAVWALLPVLRATPYD X₁: A|T, X₂: I|V (SEQ ID NO: 277)NG LSTX₁QVVAIAX₂NGGGX₃QALEX₄X₅X₆X₇QLX₈X₉LRX₁₀X₁₁PX₁₂X₁₃ X₁: A|E|G, X₂:G|S, X₃: K|R, X₄: A|G, X₅: I|V, (SEQ ID NO: 278) X₆: G|R, X₇: E|K, X₈:L|Q|R, X₉: A|E|K, X₁₀: A|T, X₁₁: A|V, X₁₂: H|Y, X₁₃: E|G NTLTPQQVVAIAX₁NTGGKX₂ALX₃AX₄X₅X₆QLX₇X₈LRX₉AX₁₀YX₁₁ X₁: A|S, X₁₀: P|R, X₁₁:E|G|R, X₂: Q|R, X₃: (SEQ ID NO: 279) E|G, X₄: I|V, X₅: C|R|T, X₆: T|V,X₇: P|R, X₈: I|V, X₉: A|G

In some aspects, the at least one repeat unit comprises any one of SEQID NO: 267-SEQ ID NO: 279. In some embodiments, the present disclosureprovides a modular nucleic acid binding domain (e.g., RNBD or MAP-NBD),wherein the modular nucleic acid binding domain comprises a repeat unitwith a sequence of A₁₋₁₁X₁X₂B₁₄₋₃₅ (SEQ ID NO: 448), wherein A₁₋₁₁comprises 11 amino acid residues and wherein each amino acid residue ofA₁₋₁₁ can be any amino acid. In some embodiments, A₁₋₁₁ can be any aminoacids in position 1 through position 11 of any one of SEQ ID NO: 168-SEQID NO: 263 or SEQ ID NO: 336-SEQ ID NO: 356. X₁X₂ comprises any repeatvariable diresidue (RVD) disclosed herein and comprises at least oneamino acid at position 12 or position 13. As described herein, this RVDcontacts and binds to a target nucleic acid base of a target site. SaidRVD can be the RVD of any repeat unit disclosed herein, such as position12 and position 13 of any one of SEQ ID NO: 168-SEQ ID NO: 263 or SEQ IDNO: 336-SEQ ID NO: 356. B₁₄₋₃₅ can comprise 22 amino acid residues andeach amino acid residue of B₁₄₋₃₅ can be any amino acid. In someembodiments, B₁₄₋₃₅ can be any amino acid in position 14 throughposition 35 of any one of SEQ ID NO: 168-SEQ ID NO: 263 or SEQ ID NO:336-SEQ ID NO: 356. In particular embodiments, a modular nucleic acidbinding domain (e.g., RNBD or MAP-NBD) having the above sequence ofA₁₋₁₁X₁X₂B₁₄₋₃₅ (SEQ ID NO: 448) can have a first repeat unit with atleast one residue in A₁₋₁₁, B₁₄₋₃₅, or a combination thereof thatdiffers from a corresponding residue in a second repeat unit in themodular nucleic acid binding domain (e.g., RNBD or MAP-NBD). In otherwords, at least two repeat units in a modular nucleic acid bindingdomain (e.g., RNBD or MAP-NBD) described herein can have different aminoacid residues with respect to each other, at the same position outsidethe RVD region. Thus, in some embodiments, a modular nucleic acidbinding domain (e.g., RNBD or MAP-NBD) described herein can have variantbackbones with respect to each repeat unit in the plurality of repeatunits that make up the modular nucleic acid binding domain. In someembodiments, an RNBD of the present disclosure can have a sequence ofGGKQALEAVRAQLLDLRAAPYG (SEQ ID NO: 280) at B₁₄₋₃₅.

In some embodiments, a modular nucleic acid binding sequence (e.g.,RNBD) can comprise one or more of the following characteristics: themodular nucleic acid binding sequence (e.g., RNBD) can bind a nucleicacid sequence, wherein the target site comprises a 5′ guanine, themodular nucleic acid binding domain (e.g., RNBD) can comprise 7 repeatunits to 25 repeat units, a first modular nucleic acid binding sequence(e.g., RNBD) can bind a target nucleic acid sequence and be separatedfrom a second modular nucleic acid binding domain (e.g., RNBD) from 2 to50 base pairs, or any combination thereof.

In some embodiments, an RNBD of the present disclosure can have the fulllength naturally occurring N-terminus of a naturally occurring Ralstoniasolanacearum-derived protein. In some embodiments, any truncation of thefull length naturally occurring N-terminus of a naturally occurringRalstonia solanacearum-derived protein can be used at the N-terminus ofan RNBD of the present disclosure. For example, in some embodiments,amino acid residues at positions 1 (H) to position 137 (F) of thenaturally occurring Ralstonia solanacearum-derived protein N-terminuscan be used. In particular embodiments, said truncated N-terminus fromposition 1 (H) to position 137 (F) can have a sequence as follows:FGKLVALGYSREQIRKLKQESLSEIAKYHTTLTGQGFTHADICRISRRRQSLRVVARNYPELAAALPELTRAHIVDIARQRSGDLALQALLPVATALTAAPLRLSASQIATVAQYGERPAIQALYRLRRKLTRAPL H(SEQ ID NO: 264). In some embodiments, the naturally occurringN-terminus of Ralstonia solanacearum can be truncated to any length andused at the N-terminus of the engineered DNA binding domain. Forexample, the naturally occurring N-terminus of Ralstonia solanacearumcan be truncated to amino acid residues at position 1 (H) to position120 (K) as follows:KQESLSEIAKYHTTLTGQGFTHADICRISRRRQSLRVVARNYPELAAALPELTRAHIVDIARQRSGDLALQALLPVATALTAAPLRLSASQIATVAQYGERPAIQALYRLRRKLTRAPLH (SEQ ID NO:303) and used at the N-terminus of the RNBD. The naturally occurringN-terminus of Ralstonia solanacearum can be truncated amino acidresidues at positions 1 to 115 and used at the N-terminus of theengineered DNA binding domain as set forth in SEQ ID NO: 320. Thenaturally occurring N-terminus of Ralstonia solanacearum can betruncated to amino acid residues at positions 1 to 50, 1 to 70, 1 to100, 1 to 120, 1 to 130, 10 to 40, 60 to 100, or 100 to 120 and used atthe N-terminus of the engineered DNA binding domain. Truncation of theN-termini can be particularly advantageous for obtaining DNA bindingdomains, which are smaller in size including number of amino acids andoverall molecular weight. A reduced number of amino acids can allow formore efficient packaging into a viral vector and a smaller molecularweight can result in more efficient loading of the DNA binding domainsin non-viral vectors for delivery.

In some embodiments, the N-terminus, referred to as the amino terminusor the “NH2” domain, can recognize a guanine. In some embodiments, theN-terminus can be engineered to bind a cytosine, adenosine, thymidine,guanine, or uracil.

In some embodiments, an RNBD of the present disclosure can have a DNAbinding domain, in which the final full length repeat unit of 33-35amino acid residues is followed by a half-repeat also derived fromRalstonia solanacearum. The half repeat can have 15 to 23 amino acidresidues, for example, the half repeat can have 19 amino acid residues.In particular embodiments, the half-repeat can have a sequence asfollows: LSTAQVVAIACISGQQALE (SEQ ID NO: 265).

In some embodiments, an RNBD of the present disclosure can have the fulllength naturally occurring C-terminus of a naturally occurring Ralstoniasolanacearum-derived protein. In some embodiments, any truncation of thefull length naturally occurring C-terminus of a naturally occurringRalstonia solanacearum-derived protein can be used at the C-terminus ofan RNBD of the present disclosure. For example, in some embodiments, theRNBD can comprise amino acid residues at position 1 (A) to position 63(S) as follows:AIEAHMPTLRQASHSLSPERVAAIACIGGRSAVEAVRQGLPVKAIRRIRREKAPVAGPPPAS (SEQ IDNO: 266) of the naturally occurring Ralstonia solanacearum-derivedprotein C-terminus. In some embodiments, the naturally occurringC-terminus of Ralstonia solanacearum can be truncated to any length andused at the C-terminus of the RNBD. For example, the naturally occurringC-terminus of Ralstonia solanacearum can be truncated to amino acidresidues at positions 1 to 63 and used at the C-terminus of the RNBD.The naturally occurring C-terminus of Ralstonia solanacearum can betruncated amino acid residues at positions 1 to 50 and used at theC-terminus of the RNBD. The naturally occurring C-terminus of Ralstoniasolanacearum can be truncated to amino acid residues at positions 1 to63, 1 to 50, 1 to 70, 1 to 100, 1 to 120, 1 to 130, 10 to 40, 60 to 100,or 100 to 120 and used at the C-terminus of the RNBD.

TABLE 4 shows N-termini, C-termini, and half-repeats derived fromRalstonia.

TABLE 4 Ralstonia-Derived N-terminus, C-terminus, and Half-Repeat SEQ IDNO Description Sequence SEQ ID Truncated N-terminus; positions 1 (H) toSEIAKYHTTLTGQGFTHADICRISRRRQSLRV NO: 320115(S) of the naturally occurring VARNYPELAAALPELTRAHIVDIARQRSGDLRalstonia solanacearum-derived protein ALQALLPVATALTAAPLRLSASQIATVAQYGN-terminus ERPAIQALYRLRRKLTRAPLH SEQ IDTruncated N-terminus; positions 1 (H) to FGKLVALGYSREQIRKLKQESLSEIAKYHTTNO: 264 137 (F) of the naturally occurringLTGQGFTHADICRISRRRQSLRVVARNYPEL Ralstonia solanacearum-derived proteinAAALPELTRAHIVDIARQRSGDLALQALLPV N-terminusATALTAAPLRLSASQIATVAQYGERPAIQAL YRLRRKLTRAPLH SEQ IDTruncated N-terminus; positions 1 (H) toKQESLSEIAKYHTTLTGQGFTHADICRISRRR NO: 303120 (K) of the naturally occurring QSLRVVARNYPELAAALPELTRAHIVDIARQRalstonia solanacearum-derived protein RSGDLALQALLPVATALTAAPLRLSASQIATN-terminus VAQYGERPAIQALYRLRRKLTRAPLH SEQ ID Half-repeatLSTAQVVAIACISGQQALE NO: 265 SEQ IDTruncated C-terminus; positions 1 (A) to AIEAHMPTLRQASHSLSPERVAAIACIGGRSNO: 266 63 (S) of the naturally occurringAVEAVRQGLPVKAIRRIRREKAPVAGPPPAS Ralstonia solanacearum-derived proteinC-terminus

In some embodiments, an RNBD can be engineered to target and bind to asite in the PDCD1 gene. For example, an RNBD with the sequenceFGKLVALGYSREQIRKLKQESLSEIAKYHTTLTGQGFTHADICRISRRRQSLRVVARNYPELAAALPELTRAHIVDIARQRSGDLALQALLPVATALTAAPLRLSASQIATVAQYGERPAIQALYRLRRKLTRAPLHLTPQQVVAIASNTGGKRALEAVCVQLPVLRAAPYRLSTEQVVAIASHDGGKQALEAVGAQLVALRAAPYALSTEQVVAIASHDGGKQALEAVGAQLVALRAAPYALSTAQVVAIASNGGGKQALEGIGEQLLKLRTAPYGLSTEQVVAIASNKGGKQALEAVKAHLLDLLGAPYVLSTEQVVAIASNKGGKQALEAVKAHLLDLLGAPYVLSTEQVVAIASNKGGKQALEAVKAHLLDLLGAPYVLSTEQVVVIANSIGGKQALEAVKVQLPVLRAAPYELSTEQVVAIASHDGGKQALEAVGAQLVALRAAPYALSTEQVVVIANSIGGKQALEAVKVQLPVLRAAPYELSTEQVVAIASNKGGKQALEAVKAHLLDLLGAPYVLSTAQVVAIASNGGGKQALEGIGEQLLKLRTAPYGLSTAQVVAIASNGGGKQALEGIGEQLLKLRTAPYGLSTAQVVAIASNGGGKQALEGIGEQLLKLRTAPYGLSTEQVVAIASHDGGKQALEAVGAQLVALRAAPYALSTEQVVAIASHDGGKQALEAVGAQLVALRAAPYALSTEQVVAIASHDGGKQALEAVGAQLVALRAAPYALSTAQVVAIASNGGGKQALEGIGEQLLKLRTAPYGLSTAQVVAIASNGGGKQALEGIGEQLLKLRTAPYGLSTAQVVAIACISGQQALEAIEAHMPTLRQASHSLSPERVAAIACIGGRSAVEAVRQGLPVKAIRRIRREKAPVAGPPPAS (SEQ ID NO: 311) can bind to theGACCTGGGACAGTTTCCCTT (SEQ ID NO: 312) nucleic acid sequence in the PDCD1gene. As another example, an RNBD with the sequenceFGKLVALGYSREQIRKLKQESLSEIAKYHTTLTGQGFTHADICRISRRRQSLRVVARNYPELAAALPELTRAHIVDIARQRSGDLALQALLPVATALTAAPLRLSASQIATVAQYGERPAIQALYRLRRKLTRAPLHLTPQQVVAIASNTGGKRALEAVCVQLPVLRAAPYRLSTAQVVAIASNGGGKQALEGIGEQLLKLRTAPYGLSTEQVVAIASHDGGKQALEAVGAQLVALRAAPYALSTAQVVAIASNGGGKQALEGIGEQLLKLRTAPYGLSTEQVVAIASHNGGKQALEAVKADLLELRGAPYALSTEQVVAIASHDGGKQALEAVGAQLVALRAAPYALSTEQVVVIANSIGGKQALEAVKVQLPVLRAAPYELSTAQVVAIASNGGGKQALEGIGEQLLKLRTAPYGLSTEQVVAIASHNGGKQALEAVKADLLELRGAPYALSTEQVVAIASHDGGKQALEAVGAQLVALRAAPYALSTEQVVAIASHDGGKQALEAVGAQLVALRAAPYALSTAQVVAIASNGGGKQALEGIGEQLLKLRTAPYGLSTEQVVAIASHNGGKQALEAVKADLLELRGAPYALSTEQVVAIASHNGGKQALEAVKADLLELRGAPYALSTEQVVVIANSIGGKQALEAVKVQLPVLRAAPYELSTEQVVAIASHNGGKQALEAVKADLLELRGAPYALSTEQVVAIASHDGGKQALEAVGAQLVALRAAPYALSTAQVVAIACISGQQALEAIEAHMPTLRQASHSLSPERVAAIACIGGRSAVEAVRQGLPVKAIRRIRREKAPVAGPPPAS (SEQ ID NO: 313) can bind to theGATCTGCATGCCTGGAGC (SEQ ID NO: 314) nucleic acid sequence in the PDCD1gene. As yet another example, an RNBD with the sequenceFGKLVALGYSREQIRKLKQESLSEIAKYHTTLTGQGFTHADICRISRRRQSLRVVARNYPELAAALPELTRAHIVDIARQRSGDLALQALLPVATALTAAPLRLSASQIATVAQYGERPAIQALYRLRRKLTRAPLHLTPQQVVAIASNTGGKRALEAVCVQLPVLRAAPYRLSTAQVVAIASNGGGKQALEGIGEQLLKLRTAPYGLSTEQVVAIASHDGGKQALEAVGAQLVALRAAPYALSTAQVVAIASNGGGKQALEGIGEQLLKLRTAPYGLSTEQVVAIASHNGGKQALEAVKADLLELRGAPYALSTEQVVAIASHDGGKQALEAVGAQLVALRAAPYALSTAQVVAIATRSGGKQALEAVRAQLLDLRAAPYGLSTAQVVAIASNGGGKQALEGIGEQLLKLRTAPYGLSTEQVVAIASHNGGKQALEAVKADLLELRGAPYALSTEQVVAIASHDGGKQALEAVGAQLVALRAAPYALSTEQVVAIASHDGGKQALEAVGAQLVALRAAPYALSTAQVVAIASNGGGKQALEGIGEQLLKLRTAPYGLSTEQVVAIASHNGGKQALEAVKADLLELRGAPYALSTEQVVAIASHNGGKQALEAVKADLLELRGAPYALSTAQVVAIATRSGGKQALEAVRAQLLDLRAAPYGLSTEQVVAIASHNGGKQALEAVKADLLELRGAPYALSTEQVVAIASHDGGKQALEAVGAQLVALRAAPYALSTAQVVAIACISGQQALEAIEAHMPTLRQASHSLSPERVAAIACIGGRSAVEAVRQGLPVKAIRRIRREKAPVAGPPPAS (SEQ ID NO: 315) can bind to theGATCTGCATGCCTGGAGC (SEQ ID NO: 314) nucleic acid sequence in the PDCD1gene. Any one of SEQ ID NO: 311, SEQ ID NO; 313, or SEQ ID NO: 315 canbe fused to any repression domain described herein (e.g., KRAB) to yielda gene repressor capable of repressing expression of the target gene.

Xanthomonas Derived Transcription Activator Like Effector (TALE)

The present disclosure provides a modular nucleic acid binding domainderived from Xanthomonas spp., also referred to herein as atranscription activator-like effector (TALE) protein, can comprise aplurality of repeat units. A repeat unit of the plurality of repeatunits recognizes a single target nucleotide, base pair, or both. Arepeat unit from Xanthomonas spp. can comprise 33-35 amino acidresidues. In some embodiments, a repeat unit can be from Xanthomonasspp. and have a sequence of

(SEQ ID NO: 299) MDPIRSRTPSPARELLPGPQPDGVQPTADRGVSPPAGGPLDGLPARRTMSRTRLPSPPAPSPAFSAGSFSDLLRQFDPSLFNTSLFDSLPPFGAHHTEAATGEWDEVQSGLRAADAPPPTMRVAVTAARPPRAKPAPRRRAAQPSDASPAAQVDLRTLGYSQQQQEKIKPKVRSTVAQHHEALVGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEATHEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDTGQLLKIAKRGGVTAVEAVHAWRNALTGAPLNLTPEQVVAIASHDGGKQALETVQRLLPVLCQAHGLTPQQVVAIASNGGGKQALETVQRLLPVLCQAHGLTPQQVVAIASNSGGKQALETVQRLLPVLCQAHGLTPEQVVAIASNGGGKQALETVQRLLPVLCQAHGLTPEQVVAIASNIGGKQALETVQALLPVLCQAHGLTPEQVVAIASNIGGKQALETVQALLPVLCQAHGLTPEQVVAIASNIGGKQALETVQALLPVLCQAHGLTPEQVVAIASHDGGKQALETVQRLLPVLCQAHGLTPEQVVAIASHDGGKQALETVQRLLPVLCQAHGLTPQQVVAIASNGGGKQALETVQRLLPVLCQAHGLTPEQVVAIASNSGGKQALETVQALLPVLCQAHGLTPEQVVAIASNSGGKQALETVQRLLPVLCQAHGLTPEQVVAIASHDGGKQALETVQRLLPVLCQAHGLTPEQVVAIASHDGGKQALETVQRLLPVLCQAHGLTPEQVVAIASHDGGKQALETVQRLLPVLCQAHGLTPQQVVAIASNGGGRPALETVQRLLPVLCQAHGLTPEQVVAIASHDGGKQALETVQRLLPVLCQAHGLTPQQVVAIASNGGGRPALESIVAQLSRPDPALAALTNDHLVALACLGGRPALDAVKKGLPHAPALIKRTNRRIPERTSHRVADHAQVVRVLGFFQCHSHPAQAFDDAMTQFGMSRHGLLQLFRRVGVTELEARSGTLPPASQRWDRILQASGMKRAKPSPTSTQTPDQASLHAFADSLERDLDAPSPMHEGDQTRASSRKRSRSDRAVTGPSAQQSFEVRVPEQRDALHLPLSWRVKRPRTSIGGGLPDPGTPTAADLAASSTVMREQDEDPFAGAADDFPAF NEEELAWLMELLPQ.

In some embodiments, a TALE of the present disclosure can comprisebetween 1 to 50 Xanthomonas spp.-derived repeat units. In someembodiments, a TALE of the present disclosure can comprise between 9 and36 Xanthomonas spp.-derived repeat units. Preferably, in someembodiments, a TALE of the present disclosure can comprise between 12and 30 Xanthomonas spp.-derived repeat units. A TALE described hereincan comprise between 5 to 10 Xanthomonas spp.-derived repeat units,between 10 to 15 Xanthomonas spp.-derived repeat units, between 15 to 20Xanthomonas spp.-derived repeat units, between 20 to 25 Xanthomonasspp.-derived repeat units, between 25 to 30 Xanthomonas spp.-derivedrepeat units, or between 30 to 35 Xanthomonas spp.-derived repeat units,between 35 to 40 Xanthomonas spp.-derived repeat units. A TALE describedherein can comprise at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,35, 36, 37, 38, 39, or 40, or more Xanthomonas spp.-derived repeatunits, such as, repeat units derived from Xanthomonas spp. proteinhaving the amino acid sequence set forth in SEQ ID NO:299.

A Xanthomonas spp.-derived repeat units can be derived from a wild-typerepeat unit, such as any one of SEQ ID NO: 323-SEQ ID NO: 326. Forexample, a Xanthomonas spp.-derived repeat units can have a sequence ofLTPDQVVAIASNHGGKQALETVQRLLPVLCQDHG (SEQ ID NO: 323) comprising an RVD ofNH, which recognizes guanine. A Xanthomonas spp.-derived repeat unitscan have a sequence of LTPDQVVAIASNGGGKQALETVQRLLPVLCQDHG (SEQ ID NO:324) comprising an RVD of NG, which recognizes thymidine. A Xanthomonasspp.-derived repeat units can have a sequence ofLTPDQVVAIASNIGGKQALETVQRLLPVLCQDHG (SEQ ID NO: 325) comprising an RVD ofNI, which recognizes adenosine. A Xanthomonas spp.-derived repeat unitscan have a sequence of LTPDQVVAIASHDGGKQALETVQRLLPVLCQDHG (SEQ ID NO:326) comprising an RVD of HD, which recognizes cytosine.

A Xanthomonas spp.-derived repeat unit can also comprise a modifiedXanthomonas spp.-derived repeat units enhanced for specific recognitionof a nucleotide or base pair. A TALE described herein can comprise oneor more wild-type Xanthomonas spp.-derived repeat units, one or moremodified Xanthomonas spp.-derived repeat units, or a combinationthereof. In some embodiments, a modified Xanthomonas spp.-derived repeatunits can comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, or 29 mutations thatcan enhance recognition of a specific nucleotide or base pair. In someembodiments, a modified Xanthomonas spp.-derived repeat unit cancomprise more than 1 modification, for example 1 to 5 modifications, 5to 10 modifications, 10 to 15 modifications, 15 to 20 modifications, 20to 25 modification, or 25-29 modifications. In some embodiments, A TALEcan comprise more than one modified Xanthomonas spp.-derived repeatunits, wherein each of the modified Xanthomonas spp.-derived repeatunits can have a different number of modifications.

In some embodiments, a TALE of the present disclosure can have the fulllength naturally occurring N-terminus of a naturally occurringXanthomonas spp.-derived protein, such as the N-terminus of SEQID NO:299. The N-terminus sequence in SEQ ID NO:299 is indicated byunderlining.

In some embodiments, a TALE of the present disclosure can comprise theamino acid residues at position 1 (N) through position 137 (M) of thenaturally occurring Xanthomonas spp.-derived protein as follows:

(SEQ ID NO: 300) MVDLRTLGYSQQQQEKIKPKVRSTVAQHHEALVGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEATHEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDTGQLLKIAKRGGVTAVEAVHAWRNALTGAPLN.

The amino acid sequence set forth in SEQ ID NO:300 includes a M added tothe N-terminus which is not present in the wild type N-terminus regionof a TALE protein. The N-terminus fragment sequence set out in SEQ IDNO:300 is generated by deleting amino acids N+288 through N+137 of theN-terminus region of a TALE protein, adding a M, such that amino acidsN+136 through N+1 of the N-terminus region of the TALE protein arepresent.

In some embodiments, the N-terminus can be truncated such that thefragment of the N-terminus includes amino acids from position 1 (N)through position 120 (K) of the naturally occurring Xanthomonasspp.-derived protein as follows:

(SEQ ID NO: 301) KPKVRSTVAQHHEALVGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEATHEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDTGQLLKIAKRGG VTAVEAVHAWRNALTGAPLN.

In some embodiments, the N-terminus can be truncated such that thefragment of the N-terminus includes amino acids from position 1 (N)through position 115 (S) of the naturally occurring Xanthomonasspp.-derived protein as follows:

(SEQ ID NO: 321) STVAQHHEALVGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEATHEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDTGQLLKIAKRGGVTAVE AVHAWRNALTGAPLN.

In some embodiments, the N-terminus can be truncated such that thefragment of the N-terminus includes amino acids from position 1 (N)through position 110 (H) of the naturally occurring Xanthomonasspp.-derived protein as follows:

(SEQ ID NO: 447) HHEALVGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEATHEAIVGVGKQWSGARALEALLTVAGELRGPPLQLDTGQLLKIAKRGGVTAVEAVHAW RNALTGAPLN.

In some embodiments, any truncation of the naturally occurringXanthomonas spp.-derived protein can be used at the N-terminus of a TALEdisclosed herein. The naturally occurring N-terminus of Xanthomonas spp.can be truncated to amino acid residues at positions 1 to 50, 1 to 70, 1to 100, 1 to 120, 1 to 130, 10 to 40, 60 to 100, or 100 to 120 and usedat the N-terminus of the TALE.

FIGS. 1A-1C show schematics of the domain structure of a TALE protein(not drawn to scale). ‘N’ and ‘C’ indicate the amino and carboxytermini, respectively. The TALE repeat domain comprising TALE repeatunits, N-Cap and C-Cap regions are labeled and the residue numberingscheme for the N-Cap and C-Cap regions and the N-terminus and C-terminusfragments are indicated. FIG. 1A includes the full-length N-cap regionthat extends from amino acid position N+1 to N+288 and full-length C-capregion that extends from amino acid position C+1 through C+278. FIG. 1Bprovides a schematic of a DNA binding protein comprising TALE repeatunits and a truncated N-terminus that extends from amino acid positionN+1 to N+136 (the notation N+137 indicates that a methionine added tothe N-terminus increases the length to 137) and a truncated C-terminusthat extends from amino acid position C+1 through C+63. FIG. 1C providesa schematic of a DNA binding protein comprising TALE repeat units and atruncated N-terminus that extends from amino acid position N+1 to N+115and a truncated C-terminus that extends from amino acid position C+1through C+63. In certain cases, the last repeat domain may be ahalf-repeat or a partial repeat as disclosed herein.

In some embodiments, a TALE of the present disclosure can have a DNAbinding domain, in which the final full length repeat unit of 33-35amino acid residues is followed by a half-repeat also derived fromXanthomonas spp. The half repeat can have 15 to 23 amino acid residues,for example, the half repeat can have 19 amino acid residues. Inparticular embodiments, the half-repeat can have a sequence as set forthin LTPQQVVAIASNGGGRPALE (SEQ ID NO: 297). In some embodiments, thehalf-repeat can have a sequence as set forth in SEQ ID NO: 327, 328,329, 330, 331, 332, 333, or 334.

TABLE 5 Xanthomonas Repeat Sequences SEQ ID NO Amino Acid SequenceDescription 323 LTPDQVVAIASNHGGKQALE RVD of NH TVQRLLPVLCQDHGrecognizing guanine 324 LTPDQVVAIASNGGGKQALE RVD of NG TVQRLLPVLCQDHGrecognizing thymidine 325 LTPDQVVAIASNIGGKQALE RVD of NI TVQRLLPVLCQDHGrecognizing adenosine SEQ ID LTPDQVVAIASHDGGKQALE RVD of HD NO: 326TVQRLLPVLCQDHG recognizing cytosine SEQ ID LTPQQVVAIASNGGGRPALEHalf repeat NO: 297 SEQ ID LTPEQVVAIASNGGGRPALE Half repeat NO: 327SEQ ID LTPDQVVAIASNGGGRPALE Half repeat NO: 328 SEQ IDLTPEQVVAIASNIGGRPALE Half repeat NO: 329 SEQ ID LTPDQVVAIASNIGGRPALEHalf repeat NO: 330 SEQ ID LTPEQVVAIASHDGGRPALE Half repeat NO: 331SEQ ID LTPDQVVAIASHDGGRPALE Half repeat NO: 332 SEQ IDLTPEQVVAIASNHGGRPALE Half repeat NO: 333 SEQ ID LTPDQVVAIASNHGGRPALEHalf repeat NO: 334

In some embodiments, a TALE of the present disclosure can have the fulllength naturally occurring C-terminus of a naturally occurringXanthomonas spp.-derived protein, such as the C-terminus of SEQ ID NO:299. The C-terminus of the TALE protein sequence set forth in SEQ IDNO:299 is italicized. In some embodiments, the C-terminus can be afragment of the full length naturally occurring C-terminus of anaturally occurring Xanthomonas spp.-derived protein. In someembodiments, the C-terminus can be less than 250 amino acids long. Insome embodiments, the C-terminus can be positions 1 (S) through position278 (Q) of the naturally occurring Xanthomonas spp.-derived protein asfollows: SIVAQLSRPDPALAALTNDHLVALACLGGRPALDAVKKGLPHAPALIKRTNRRIPERTSHRVADHAQVVRVLGFFQCHSHPAQAFDDAMTQFGMSRHGLLQLFRRVGVTELEARSGTLPPASQRWDRILQASGMKRAKPSPTSTQTPDQASLHAFADSLERDLDAPSPTHEGDQRRASSRKRSRSDRAVTGPSAQQSFEVRAPEQRDALHLPLSWRVKRPRTSIGGGLPDPGTPTAADLAASSTVMREQDEDPFAGAADDFPAFNEEELAWLMELLPQ (SEQ ID NO: 302). In some embodiments,any truncation of the full length naturally occurring C-terminus of anaturally occurring Xanthomonas spp.-derived protein can be used at theC-terminus of a TALE of the present disclosure. For example, in someembodiments, the naturally occurring N-terminus of Xanthomonas spp. canbe truncated to amino acid residues at position 1 (S) to position 63 (X)as follows:SIVAQLSRPDPALAALTNDHLVALACLGGRPALDAVKKGLPHAPALIKRTNRRIPERTSHRV A (SEQ IDNO: 298). The naturally occurring C-terminus of Xanthomonas spp. can betruncated amino acid residues at positions 1 to 50 and used at theC-terminus of the engineered DNA binding domain. The naturally occurringC-terminus of Xanthomonas spp. can be truncated to amino acid residuesat positions 1 to 63, 1 to 50, 1 to 70, 1 to 100, 1 to 120, 1 to 130, 10to 40, 60 to 100, or 100 to 120 and used at the C-terminus of theengineered DNA binding domain.

The terms “N-cap” polypeptide and “N-terminal sequence” are used torefer to an amino acid sequence (polypeptide) that flanks the N-terminalportion of the first TALE repeat unit. The N-cap sequence can be of anylength (including no amino acids), so long as the TALE-repeat unit(s)function to bind DNA. An N-terminal fragment and grammatical equivalentsthereof refers to a shortened sequence of an N-terminal sequence whichfragment is sufficient for the TALE repeat units to bind to DNA.

The term “C-cap” or “C-terminal region” refers to optionally presentamino acid sequences that may be flanking the C-terminal portion of thelast TALE repeat unit. The C-cap can also comprise any part of aterminal C-terminal TALE repeat, including 0 residues, truncations of aTALE repeat or a full TALE repeat. A C-terminal fragment and grammaticalequivalents thereof refers to a shortened sequence of a C-terminalsequence which fragment is sufficient for the TALE repeat units to bindto DNA.

Animal Pathogen Derived Modular Nucleic Acid Binding Domains

The present disclosure provides a modular nucleic acid binding domainderived from an animal pathogen protein (MAP-NBD) can comprise aplurality of repeat units, wherein a repeat unit of the plurality ofrepeat units recognizes a single target nucleotide, base pair, or both.

In some embodiments, the repeat unit can be derived from an animalpathogen, and can be referred to as a non-naturally occurring modularnucleic acid binding domain derived from an animal pathogen protein(MAP-NBD), or “modular animal pathogen-nucleic acid binding domain”(MAP-NBD). For example, in some cases, the animal pathogen can be fromthe Gram-negative bacterium genus, Legionella. In other cases, theanimal pathogen can be from Burkholderia. In some cases, the animalpathogen can be from Paraburkholderia. In other cases, the animalpathogen can be from Francisella.

In particular embodiments, the repeat unit can be derived from a speciesof the genus of Legionella, such as Legionella quateirensis, the genusof Burkholderia, the genus of Paraburkholderia, or the genus ofFrancisella. In some embodiments, the repeat unit can comprise from 19amino acid residues to 35 amino acid residues. In particularembodiments, the repeat unit can comprise 33 amino acid residues. Inother embodiments, the repeat unit can comprise 35 amino acid residues.In some embodiments, the MAP-NBD is non-naturally occurring, andcomprises a plurality of repeat units and wherein a repeat unit of theplurality of repeat units recognizes a single target nucleic acid.

In some embodiments, a repeat unit can be derived from a Legionellaquateirensis protein with the following sequence:

(SEQ ID NO: 281) MPDLELNFAIPLHLFDDETVFTHDATNDNSQASSSYSSKSSPASANARKRTSRKEMSGPPSKEPANTKSRRANSQNNKLSLADRLTKYNIDEEFYQTRSDSLLSLNYTKKQIERLILYKGRTSAVQQLLCKHEELLNLISPDGLGHKELIKIAARNGGGNNLIAVLSCYAKLKEMGFSSQQIIRMVSHAGGANNLKAVTANHDDLQNMGFNVEQIVRMVSHNGGSKNLKAVTDNHDDLKNMGFNAEQIVRMVSHGGGSKNLKAVTDNHDDLKNMGFNAEQIVSMVSNNGGSKNLKAVTDNHDDLKNMGFNAEQIVSMVSNGGGSLNLKAVKKYHDALKDRGFNTEQIVRMVSHDGGSLNLKAVKKYHDALRERKFNVEQIVSIVSHGGGSLNLKAVKKYHDVLKDREFNAEQIVRMVSHDGGSLNLKAVTDNHDDLKNMGFNAEQIVRMVSHKGGSKNLALVKEYFPVFSSFHFTADQIVALICQSKQCFRNLKKNHQQWKNKGLSAEQIVDLILQETPPKPNFNNTSSSTPSPSAPSFFQGPSTPIPTPVLDNSPAPIFSNPVCFFSSRSENNTEQYLQDSTLDLDSQLGDPTKNFNVNNFWSLFPFDDVGYHPHSNDVGYHLHSDEESPFFDF.

In some embodiments, a repeat from a Legionella quateirensis protein cancomprise a repeat with a canonical RVD or a non-canonical RVD. In someembodiments, a canonical RVD can comprise NN, NG, HD, or HD. In someembodiments, a non-canonical RVD can comprise RN, HA, HN, HG, HG, or HK.

In some embodiments, a repeat of SEQ ID NO: 282 comprises an RVD of HAand primarily recognizes a base of adenine (A). In some embodiments, arepeat of SEQ ID NO: 283 comprises an RVD of HN and recognizes a basecomprising guanine (G). In some embodiments, a repeat of S SEQ ID NO:284 comprises an RVD of HG and recognizes a base comprising thymine (T).In some embodiments, a repeat of SEQ ID NO: 285 comprises an RVD of NNand recognizes a base comprising guanine (G). In some embodiments, arepeat of SEQ ID NO: 286 comprises an RVD of NG and recognizes a basecomprising thymine (T). In some embodiments, a repeat of SEQ ID NO: 287comprises an RVD of HD and recognizes a base comprising cytosine (C). Insome embodiments, a repeat of SEQ ID NO: 288 comprises an RVD of HG andrecognizes a base comprising thymine (T). In some embodiments, a repeatof SEQ ID NO: 289 comprises an RVD of HD and recognizes a basecomprising cytosine (C). In some embodiments, a half-repeat of SEQ IDNO: 290 comprises an RVD of HK and recognizes a base comprising guanine(G). In some embodiments, a repeat of SEQ ID NO: 357 comprises an RVD ofRN and recognizes a base comprising guanine (G).

TABLE 6 illustrates exemplary repeats from Legionella quateirensis,Burkholderia, Paraburkholderia, or Francisella that can make up aMAP-NBD of the present disclosure and the RVD at position 12 and 13 ofthe particular repeat. A MAP-NBD of the present disclosure can compriseat least one of the repeats disclosed in TABLE 5 including any one ofSEQ ID NO: 357, SEQ ID NO: 282-SEQ ID NO: 290, or SEQ ID NO: 358-SEQ IDNO: 446. A MAP-NBD of the present disclosure can comprise anycombination of repeats disclosed in TABLE 5 including any one of SEQ IDNO: 357, SEQ ID NO: 282-SEQ ID NO: 290, or SEQ ID NO: 358-SEQ ID NO:446.

TABLE 6 Animal Pathogen Derived Repeat Units SEQ ID NO OrganismRepeat Unit Sequence RVD SEQ ID NO: 357 L. quateirensisLGHKELIKIAARNGGGNNLIAVLSCYAKLKEMG RN SEQ ID NO: 282 L. quateirensisFSSQQIIRMVSHAGGANNLKAVTANHDDLQNMG HA SEQ ID NO: 283 L. quateirensisFNVEQIVRMVSHNGGSKNLKAVTDNHDDLKNMG HN SEQ ID NO: 284 L. quateirensisFNAEQIVRMVSHGGGSKNLKAVTDNHDDLKNMG HG SEQ ID NO: 285 L. quateirensisFNAEQIVSMVSNNGGSKNLKAVTDNHDDLKNMG NN SEQ ID NO: 286 L. quateirensisFNAEQIVSMVSNGGGSLNLKAVKKYHDALKDRG NG SEQ ID NO: 287 L. quateirensisFNTEQIVRMVSHDGGSLNLKAVKKYHDALRERK HD SEQ ID NO: 288 L. quateirensisFNVEQIVSIVSHGGGSLNLKAVKKYHDVLKDRE HG SEQ ID NO: 289 L. quateirensisFNAEQIVRMVSHDGGSLNLKAVTDNHDDLKNMG HD SEQ ID NO: 290 L. quateirensisFNAEQIVRMVSHKGGSKNL HK (half-repeat) SEQ ID NO: 358 L. quateirensisFSAEQIVRIAAHDGGSRNIEAVQQAQHVLKELG HD SEQ ID NO: 359 L. quateirensisFSAEQIVSIVAHDGGSRNIEAVQQAQHILKELG HD SEQ ID NO: 360 L. quateirensisFSRQQILRIASHDGGSKNIAAVQKFLPKLMNFGFN HD SEQ ID NO: 361 L. quateirensisFSAEQIVRIAAHDGGSLNIDAVQQAQQALKELG HD SEQ ID NO: 362 L. quateirensisFSTEQIVCIAGHGGGSLNIKAVLLAQQALKDLG HG SEQ ID NO: 363 L. quateirensisFSSEQIVRVAAHGGGSLNIKAVLQAHQALKELD HG SEQ ID NO: 364 L. quateirensisFSAEQIVHIAAHGGGSLNIKAILQAHQTLKELN HG SEQ ID NO: 365 L. quateirensisFSAEQIVRIAAHIGGSRNIEAIQQAHHALKELG HI SEQ ID NO: 366 L. quateirensisFSAEQIVRIAAHIGGSHNLKAVLQAQQALKELD HI SEQ ID NO: 367 L. quateirensisFSAKHIVRIAAHIGGSLNIKAVQQAQQALKELG HI SEQ ID NO: 368 L. quateirensisFNAEQIVRMVSHKGGSKNLALVKEYFPVFSSFH HK SEQ ID NO: 369 L. quateirensisFNAEQIVRMVSHKGGSKNLALVKEYFPVFSSFHFT HK SEQ ID NO: 370 L. quateirensisFSADQIVRIAAHKGGSHNIVAVQQAQQALKELD HK SEQ ID NO: 371 L. quateirensisFNVEQIVRMVSHNGGSKNLKAVTDNHDDLKNMGFN HN SEQ ID NO: 372 L. quateirensisFSADQVVKIAGHSGGSNNIAVMLAVFPRLRDFGFK HS SEQ ID NO: 373 L. quateirensisFSAEQIVSIAAHVGGSHNIEAVQKAHQALKELD HV SEQ ID NO: 374 L. quateirensisFNAEQIVSMVSNNGGSKNLKAVTDNHDDLKNMGFN NN SEQ ID NO: 375 L. quateirensisFSHKELIKIAARNGGGNNLIAVLSCYAKLKEMG RN SEQ ID NO: 376 L. quateirensisFSHKELIKIAARNGGGNNLIAVLSCYAKLKEMGFS RN SEQ ID NO: 377 BurkholderiaFSSGETVGATVGAGGTETVAQGGTASNTTVSSGGY GA SEQ ID NO: 378 BurkholderiaFSGGMATSTTVGSGGTQDVLAGGAAVGGTVGTGGV GS SEQ ID NO: 379 BurkholderiaFSAADIVKIAGKIGGAQALQAFITHRAALIQAGFS KI SEQ ID NO: 380 BurkholderiaFNPTDIVKIAGNDGGAQALQAVLELEPALRERGFS ND SEQ ID NO: 381 BurkholderiaFNPTDIVRMAGNDGGAQALQAVFELEPAFRERSFS ND SEQ ID NO: 382 BurkholderiaFNPTDIVRMAGNDGGAQALQAVLELEPAFRERGFS ND SEQ ID NO: 383 BurkholderiaFSQVDIVKIASNDGGAQALYSVLDVEPTFRERGFS ND SEQ ID NO: 384 BurkholderiaFSRADIVKIAGNDGGAQALYSVLDVEPPLRERGFS ND SEQ ID NO: 385 BurkholderiaFSRGDIVKIAGNDGGAQALYSVLDVEPPLRERGFS ND SEQ ID NO: 386 BurkholderiaFNRADIVRIAGNGGGAQALYSVRDAGPTLGKRGFS NG SEQ ID NO: 387 BurkholderiaFRQADIVKIASNGGSAQALNAVIKLGPTLRQRGFS NG SEQ ID NO: 388 BurkholderiaFRQADIVKMASNGGSAQALNAVIKLGPTLRQRGFS NG SEQ ID NO: 389 BurkholderiaFSRADIVKIAGNGGGAQALQAVLELEPTFRERGFS NG SEQ ID NO: 390 BurkholderiaFSRADIVRIAGNGGGAQALYSVLDVGPTLGKRGFS NG SEQ ID NO: 391 BurkholderiaFSRGDIVRIAGNGGGAQALQAVLELEPTLGERGFS NG SEQ ID NO: 392 BurkholderiaFSRADIVKIAGNGGGAQALQAVITHRAALTQAGFS NG SEQ ID NO: 393 BurkholderiaFSRGDTVKIAGNIGGAQALQAVLELEPTLRERGFS NI SEQ ID NO: 394 BurkholderiaFNPTDIVKIAGNIGGAQALQAVLELEPAFRERGFS NI SEQ ID NO: 395 BurkholderiaFSAADIVKIAGNIGGAQALQAIFTHRAALIQAGFS NI SEQ ID NO: 396 BurkholderiaFSAADIVKIAGNIGGAQALQAVITHRATLTQAGFS NI SEQ ID NO: 397 BurkholderiaFSATDIVKIASNIGGAQALQAVISRRAALIQAGFS NI SEQ ID NO: 398 BurkholderiaFSQPDIVKIAGNIGGAQALQAVLELEPAFRERGFS NI SEQ ID NO: 399 BurkholderiaFSRADIVKIAGNIGGAQALQAVLELESTFRERSFN NI SEQ ID NO: 400 BurkholderiaFSRADIVKIAGNIGGAQALQAVLELESTLRERSFN NI SEQ ID NO: 401 BurkholderiaFSRGDIVKMAGNIGGAQALQAGLELEPAFRERGFS NI SEQ ID NO: 402 BurkholderiaFSRGDIVKMAGNIGGAQALQAVLELEPAFHERSFC NI SEQ ID NO: 403 BurkholderiaFTLTDIVKMAGNIGGAQALKAVLEHGPTLRQRDLS NI SEQ ID NO: 404 BurkholderiaFTLTDIVKMAGNIGGAQALKVVLEHGPTLRQRDLS NI SEQ ID NO: 405 BurkholderiaFNPTDIVKIAGNNGGAQALQAVLELEPALRERGFS NN SEQ ID NO: 406 BurkholderiaFNPTDIVKIAGNNGGAQALQAVLELEPALRERSFS NN SEQ ID NO: 407 BurkholderiaFNPTDMVKIAGNNGGAQALQAVLELEPALRERGFS NN SEQ ID NO: 408 BurkholderiaFSAADIVKIASNNGGAQALQALIDHWSTLSGKTKA NN SEQ ID NO: 409 BurkholderiaFSAADIVKIASNNGGAQALQAVISRRAALIQAGFS NN SEQ ID NO: 410 BurkholderiaFSAADIVKIASNNGGAQALQAVITHRAALAQAGFS NN SEQ ID NO: 411 BurkholderiaFSAADIVKIASNNGGARALQALIDHWSTLSGKTKA NN SEQ ID NO: 412 BurkholderiaFTLTDIVEMAGNNGGAQALKAVLEHGSTLDERGFT NN SEQ ID NO: 413 BurkholderiaFTLTDIVKMAGNNGGAQALKAVLEHGPTLDERGFT NN SEQ ID NO: 414 BurkholderiaFTLTDIVKMAGNNGGAQALKVVLEHGPTLRQRGFS NN SEQ ID NO: 415 BurkholderiaFTLTDIVKMASNNGGAQALKAVLEHGPTLDERGFT NN SEQ ID NO: 416 BurkholderiaFSAADIVKIAGNSGGAQALQAVISHRAALTQAGFS NS SEQ ID NO: 417 BurkholderiaFSGGDAVSTVVRSGGAQSVASGGTASGTTVSAGAT RS SEQ ID NO: 418 BurkholderiaFRQTDIVKMAGSGGSAQALNAVIKHGPTLRQRGFS SG SEQ ID NO: 419 BurkholderiaFSLIDIVEIASNGGAQALKAVLKYGPVLTQAGRS SN SEQ ID NO: 420 BurkholderiaFSGGDAAGTVVSSGGAQNVTGGLASGTTVAGGAA SS SEQ ID NO: 421 ParaburkholderiaFNLTDIVEMAANSGGAQALKAVLEHGPTLRQRGLS NS SEQ ID NO: 422 ParaburkholderiaFNRASIVKIAGNSGGAQALQAVLKHGPTLDERGFN NS SEQ ID NO: 423 ParaburkholderiaFSQANIVKMAGNSGGAQALQAVLDLELVFRERGFS NS SEQ ID NO: 424 ParaburkholderiaFSQPDIVKMAGNSGGAQALQAVLDLELAFRERGFS NS SEQ ID NO: 425 ParaburkholderiaFSLIDIVEIASNGGAQALKAVLKYGPVLMQAGRS SN SEQ ID NO: 426 FrancisellaYKSEDIIRLASHDGGSVNLEAVLRLHSQLTRLG HD SEQ ID NO: 427 FrancisellaYKPEDIIRLASHGGGSVNLEAVLRLNPQLIGLG HG SEQ ID NO: 428 FrancisellaYKSEDIIRLASHGGGSVNLEAVLRLHSQLTRLG HG SEQ ID NO: 429 FrancisellaYKSEDIIRLASHGGGSVNLEAVLRLNPQLIGLG HG SEQ ID NO: 430 ParaburkholderiaFNLTDIVEMAGKGGGAQALKAVLEHGPTLRQRGFN KG SEQ ID NO: 431 ParaburkholderiaFRQADIIKIAGNDGGAQALQAVIEHGPTLRQHGFN ND SEQ ID NO: 432 ParaburkholderiaFSQADIVKIAGNDGGTQALHAVLDLERMLGERGFS ND SEQ ID NO: 433 ParaburkholderiaFSRADIVKIAGNGGGAQALKAVLEHEATLDERGFS NG SEQ ID NO: 434 ParaburkholderiaFSRADIVRIAGNGGGAQALYSVLDVEPTLGKRGFS NG SEQ ID NO: 435 ParaburkholderiaFSQPDIVKMASNIGGAQALQAVLELEPALRERGFS NI SEQ ID NO: 436 ParaburkholderiaFSQPDIVKMAGNIGGAQALQAVLSLGPALRERGFS NI SEQ ID NO: 437 ParaburkholderiaFSQPEIVKIAGNIGGAQALHTVLELEPTLHKRGFN NI SEQ ID NO: 438 ParaburkholderiaFSQSDIVKIAGNIGGAQALQAVLDLESMLGKRGFS NI SEQ ID NO: 439 ParaburkholderiaFSQSDIVKIAGNIGGAQALQAVLELEPTLRESDFR NI SEQ ID NO: 440 ParaburkholderiaFNPTDIVKIAGNKGGAQALQAVLELEPALRERGFN NK SEQ ID NO: 441 ParaburkholderiaFSPTDIIKIAGNNGGAQALQAVLDLELMLRERGFS NN SEQ ID NO: 442 ParaburkholderiaFSQADIVKIAGNNGGAQALYSVLDVEPTLGKRGFS NN SEQ ID NO: 443 ParaburkholderiaFSRGDIVTIAGNNGGAQALQAVLELEPTLRERGFN NN SEQ ID NO: 444 ParaburkholderiaFSRIDIVKIAANNGGAQALHAVLDLGPTLRECGFS NN SEQ ID NO: 445 ParaburkholderiaFSQADIVKIVGNNGGAQALQAVFELEPTLRERGFN NN SEQ ID NO: 446 ParaburkholderiaFSQPDIVRITGNRGGAQALQAVLALELTLRERGFS NR

In any one of the animal pathogen-derived repeat domains of SEQ ID NO:357, SEQ ID NO: 282-SEQ ID NO: 290, or SEQ ID NO: 358-SEQ ID NO: 446,there can be considerable sequence divergence between repeats of aMAP-NBD outside of the RVD.

In some embodiments, a MAP-NBD of the present disclosure can comprisebetween 1 to 50 animal pathogen-derived repeat units. In someembodiments, a MAP-NBD of the present disclosure can comprise between 9and 36 animal pathogen-derived repeat units. Preferably, in someembodiments, a MAP-NBD of the present disclosure can comprise between 12and 30 animal pathogen-derived repeat units. A MAP-NBD described hereincan comprise between 5 to 10, 10 to 15, 15-20, 20 to 25, 25 to 30, 30 to35, or 35 to 40, e.g., 15-25 animal pathogen-derived repeat units. AMAP-NBD described herein can comprise 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32,33, 34, 35, 36, 37, 38, 39 or 40 animal pathogen-derived repeat units,e.g.

A MAP-NBD described herein can comprise 5, 10, 11, 12, 13, 14, 15, 16,17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,35, 36, 37, 38, 39 or 40 animal pathogen-derived repeat units.

An animal pathogen-derived repeat units can be derived from a wild-typerepeat unit, such as any one of SEQ ID NO: 357, SEQ ID NO: 282-SEQ IDNO: 290, or SEQ ID NO: 358-SEQ ID NO: 446. An animal pathogen-derivedrepeat unit can also comprise a modified animal pathogen-derived repeatunits enhanced for specific recognition of a nucleotide or base pair. AMAP-NBD described herein can comprise one or more wild-type animalpathogen-derived repeat units, one or more modified animalpathogen-derived repeat units, or a combination thereof. In someembodiments, a modified animal pathogen-derived repeat units cancomprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,19, 20, 21, 22, 23, 24, 25, 26, 27, 28, or 29 mutations that can enhancerecognition of a specific nucleotide or base pair. In some embodiments,a modified animal pathogen-derived repeat unit can comprise more than 1modification, for example 1 to 5 modifications, 5 to 10 modifications,10 to 15 modifications, 15 to 20 modifications, 20 to 25 modification,or 25-29 modifications. In some embodiments, A MAP-NBD can comprise morethan one modified animal pathogen-derived repeat units, wherein each ofthe modified animal pathogen-derived repeat units can have a differentnumber of modifications.

In some embodiments, a MAP-NBD of the present disclosure can have thefull length naturally occurring N-terminus of a naturally occurringLegionella quateirensis-derived protein, such as the N-terminus of SEQID NO: 281. A N-terminus can be the full length N-terminus sequence andcan have a sequence ofMPDLELNFAIPLHLFDDETVFTHDATNDNSQASSSYSSKSSPASANARKRTSRKEMSGPPSKEPANTKSRRANSQNNKLSLADRLTKYNIDEEFYQTRSDSLLSLNYTKKQIERLILYKGRTSAVQQLLCKHEELLNLISPDG (SEQ ID NO: 291). In some embodiments, anytruncation of SEQ ID NO: 291 can be used as the N-terminus in a MAP-NBDof the present disclosure. For example, in some embodiments, a MAP-NBDcomprises a truncated N-terminus including amino acid residues atposition 1 (G) to position 137 (S) of the naturally occurring Legionellaquateirensis N-terminus as follows:NFAIPLHLFDDETVFTHDATNDNSQASSSYSSKSSPASANARKRTSRKEMSGPPSKEPANTKSRRANSQNNKLSLADRLTKYNIDEEFYQTRSDSLLSLNYTKKQIERLILYKGRTSAVQQLLCKHEELLNLISPDG (SEQ ID NO: 335). For example, in some embodiments, aMAP-NBD comprises a truncated N-terminus including amino acid residuesat position 1 (G) to position 120 (S) of the naturally occurringLegionella quateirensis N-terminus as follows:DATNDNSQASSSYSSKSSPASANARKRTSRKEMSGPPSKEPANTKSRRANSQNNKLSLADRLTKYNIDEEFYQTRSDSLLSLNYTKKQIERLILYKGRTSAVQQLLCKHEELLNLISPDG (SEQ ID NO:304). In some embodiments, a MAP-NBD comprises a truncated N-terminusincluding amino acid residues at position 1 (G) to position 115 (K) ofthe naturally occurring Legionella quateirensis N-terminus as follows:NSQASSSYSSKSSPASANARKRTSRKEMSGPPSKEPANTKSRRANSQNNKLSLADRLTKYNIDEEFYQTRSDSLLSLNYTKKQIERLILYKGRTSAVQQLLCKHEELLNLISPDG (SEQ ID NO: 322).In some embodiments, any truncation of the naturally occurringLegionella quateirensis-derived protein can be used at the N-terminus ofa DNA binding domain disclosed herein. The naturally occurringN-terminus of Legionella quateirensis can be truncated to amino acidresidues at positions 1 to 50, 1 to 70, 1 to 100, 1 to 120, 1 to 130, 10to 40, 60 to 100, or 100 to 120 and used at the N-terminus of theMAP-NBD.

In some embodiments, a MAP-NBD of the present disclosure can have thefull length naturally occurring C-terminus of a naturally occurringLegionella quateirensis-derived protein. In some embodiments, A MAP-NBDof the present disclosure can have at its C-terminus amino acid residuesat position 1 (A) to position 176 (F) of the naturally occurringLegionella quateirensis-derived protein as follows:

(SEQ ID NO: 305) ALVKEYFPVFSSFHFTADQIVALICQSKQCFRNLKKNHQQWKNKGLSAEQIVDLILQETPPKPNFNNTSSSTPSPSAPSFFQGPSTPIPTPVLDNSPAPIFSNPVCFFSSRSENNTEQYLQDSTLDLDSQLGDPTKNFNVNNFWSLFPFDDVGYHPHSNDVGYHLHSDEESPFFDF.

In some embodiments, a MAP-NBD of the present disclosure can have at itsC-terminus amino acid residues at position 1 (A) to position 63 (P) ofthe naturally occurring Legionella quateirensis-derived protein asfollows:

(SEQ ID NO: 306) ALVKEYFPVFSSFHFTADQIVALICQSKQCFRNLKKNHQQWKNKGLSAEQIVDLILQETPPKP.

In some embodiments, the present disclosure provides methods foridentifying an animal pathogen-derived repeat unit. For example, aconsensus sequence can be defined comprising a first repeat motif, aspacer, and a second repeat motif. The consensus sequence can be1xxx211x1xxx33x2x1xxxxxxxxx1xxxx1xxx211x1xxx33x2x1xxxxxxxxx1 (SEQ ID NO:292), 1xxx211x1xxx33x2x1xxxxxxxxx1xxxxx1xxx211x1xxx33x2x1xxxxxxxxx1 (SEQID NO: 293),1xxx211x1xxx33x2x1xxxxxxxxx1xxxxxx1xxx211x1xxx33x2x1xxxxxxxxx1 (SEQ IDNO: 294),1xxx211x1xxx33x2x1xxxxxxxxx1xxxxxxx1xxx211x1xxx33x2x1xxxxxxxxx1 (SEQ IDNO: 295),1xxx211x1xxx33x2x1xxxxxxxxx1xxxxxxxx1xxx211x1xxx33x2x1xxxxxxxxx1 (SEQ IDNO: 296). For any one of SEQ ID NO: 292-SEQ ID NO: 296, x can be anyamino acid residue, 1, 2, and 3 are flexible residues that are definedas follows: 1 can be selected from any one of A, F, I, L, M, T, or V, 2can be selected from any one of D, E, K, N, M, S, R, or Q, and 3 can beselected from any one of A, G, N, or S. Thus, in some embodiments, aMAP-NBD can be derived from an animal pathogen comprising the consensussequence of SEQ ID NO: 292, SEQ ID NO: 293, SEQ ID NO: 294, SEQ ID NO:295, or SEQ ID NO: 296. Any one of consensus sequences of SEQ ID NO:292-SEQ ID NO: 296 can be compared against all sequences downloaded fromNCBI, MGRast, JGI, and EBI databases to identify matches correspondingto animal pathogen proteins containing repeat units of a DNA-bindingrepeat unit.

In some embodiments, a MAP-NBD repeat unit can itself have a consensussequence of 1xxx211x1xxx33x2x1xxxxxxxxx1 (SEQ ID NO: 452), wherein x canbe any amino acid residue, 1, 2, and 3 are flexible residues that aredefined as follows: 1 can be selected from any one of A, F, I, L, M, T,or V, 2 can be selected from any one of D, E, K, N, M, S, R, or Q, and 3can be selected from any one of A, G, N, or S.

Mixed DNA Binding Domains

In some embodiments, the present disclosure provides DNA binding domainsin which the repeat units, the N-terminus, and the C-terminus can bederived from any one of Ralstonia solanacearum, Xanthomonas spp.,Legionella quateirensis, Burkholderia, Paraburkholderia, or Francisella.For example, the present disclosure provides a DNA binding domainwherein the plurality of repeat units are selected from any one of SEQID NO: 168-SEQ ID NO: 263 or SEQ ID NO: 336-SEQ ID NO 356 and canfurther comprise an N-terminus and/or C-terminus from Xanthomonas spp.,(N-termini: SEQ ID NO: 298, SEQ ID NO: 300, SEQ ID NO: 301, and SEQ IDNO: 321; C-termini: SEQ ID NO: 302 and SEQ ID NO: 298) or Legionellaquateirensis (N-termini: SEQ ID NO: 304 or SEQ ID NO: 322; C-termini:SEQ ID NO: 305 and SEQ ID NO: 306). In some embodiments, the presentdisclosure provides modular DNA binding domains in which the repeatunits can be from Ralstonia solanacearum (e.g., any one of SEQ ID NO:168-SEQ ID NO: 263 or SEQ ID NO: 336-SEQ ID NO 356), Xanthomonas spp.(e.g., any one of SEQ ID NO: 323-SEQ ID NO: 334), an animal pathogensuch as Legionella quateirensis, Burkholderia, Paraburkholderia, orFrancisella (e.g., any one of SEQ ID NO: 357, SEQ ID NO: 282-SEQ ID NO:290, or SEQ ID NO: 358-SEQ ID NO: 446), or any combination thereof.

Nucleases for Genome Editing

Genome editing can include the process of modifying a DNA of a cell inorder to introduce or knock out a target gene or a target gene region.In some instances, a subject may have a disease in which a protein isaberrantly expressed or completely lacking. One therapeutic strategy fortreating this disease can be introduction of a target gene or a targetgene region to correct the aberrant or missing protein. For example,genome editing can be used to modify the DNA of a cell in the subject inorder to introduce a functional gene, which gives rise to a functionalprotein. Introduction of this functional gene and expression of thefunctional protein can relieve the disease state of the subject.

In other instances, a subject may have a disease in which protein isoverexpressed or is targeted by a virus for infection of a cell.Alternatively, a therapy such as a cell therapy for cancer can beineffective due to repression of certain processes by tumor cells (e.g.,checkpoint inhibition). Still alternatively, it may be desirable toeliminate a particular protein expressed at the surface of a cell inorder to generate a universal, off-the-shelf cell therapy for a subjectin need thereof (e.g., TCR). In such cases, it can be desirable topartially or completely knock out the gene encoding for such a protein.Genome editing can be used to modify the DNA of a cell in the subject inorder to partially or completely knock out the target gene, thusreducing or eliminating expression of the protein of interest.

Genome editing can include the use of any nuclease as described hereinin combination with any DNA binding domain disclosed herein in order tobind to a target gene or target gene region and induce a double strandbreak, mediated by the nuclease. Genes can be introduced during thisprocess, or DNA binding domains can be designed to cut at regions of theDNA such that after non-homologous end joining, the target gene ortarget gene region is removed. Genome editing systems that are furtherdisclosed and described in detail herein can include DNA binding domainsfrom Xanthomonas, Ralstonia, or Legionella fused to nucleases.

The specificity and efficiency of genome editing can be dependent on thenuclease responsible for cleavage. More than 3,000 type II restrictionendonucleases have been identified. They recognize short, usuallypalindromic, sequences of 4-8 bp and, in the presence of Mg2+, cleavethe DNA within or in close proximity to the recognition sequence.Naturally, type IIs restriction enzymes themselves have a DNArecognition domain that can be separated from the catalytic, orcleavage, domain. As such, since cleavage occurs at a site adjacent tothe DNA sequence bound by the recognition domain, these enzymes can bereferred to as exhibiting “shifted” cleavage. These type IIs restrictionenzymes having both the recognition domain and the cleavage domain canbe 400-600 amino acids. The main criterion for classifying a restrictionendonuclease as a type II enzyme is that it cleaves specifically withinor close to its recognition site and that it does not require ATPhydrolysis for its nucleolytic activity. An example of a type IIrestriction endonucleases is FokI, which consists of a DNA recognitiondomain and a non-specific DNA cleavage domain. FokI cleaves DNA nine andthirteen bases downstream of an asymmetric sequence (recognizing a DNAsequence of GGATG).

In some embodiments, the DNA cleavage domain at the C-terminus of FokIitself can be combined with a variety of DNA-binding domains (e.g.,RNBDs, TALEs, MAP-NBDs) of other molecules for genome editing purposes.This cleavage domain can be 180 amino acids in length and can bedirectly linked to a DNA binding domain (e.g., RNBDs, TALEs, MAP-NBDs).In some embodiments, the FokI cleavage domain only comprises a singlecatalytic site. Thus, in order to cleave phosphodiester bonds, theseenzymes form transient homodimers, providing two catalytic sites capableof cleaving double stranded DNA. In some embodiments, a singleDNA-binding domains (e.g., RNBDs, TALEs, MAP-NBDs) linked to a Type IIScleaving domain may not nick the double stranded DNA at the targetedsite. In some embodiments, cleaving of target DNA only occurs when apair of DNA-binding domains (e.g., RNBDs, TALEs, MAP-NBDs), each linkedto a Type IIS cleaving domain (e.g., any one of SEQ ID NO: 1-SEQ ID NO:81 (nucleotide sequences of SEQ ID NO: 82-SEQ ID NO: 162)) bind toopposing strands of DNA and allow for formation of a transient homodimerin the spacer region (the base pairs between the C-terminus of the DNAbinding domain on a top strand of DNA and the C-terminus of the DNAbinding domain on a bottom strand of DNA). Said spacer region can begreater than 2 base pairs, greater than 5 base pairs, greater than 10base pairs, greater than 15 base pairs, greater than 24 base pairs,greater than 25 base pairs, greater than 30 base pairs, greater than 35base pairs, greater than 40 base pairs, greater than 45 base pairs, orgreater than 50 base pairs. In some embodiments, the spacer region canbe anywhere from 2 to 50 base pairs, 5 to 40 base pairs, 10 to 30 basepairs, 14 to 40 base pairs, 24 to 30 base pairs, 24 to 40 base pairs, or24 to 50 base pairs. In some embodiments, the nuclease disclosed herein(e.g., any one of SEQ ID NO: 1-SEQ ID NO: 81 (nucleotide sequences ofSEQ ID NO: 82-SEQ ID NO: 162) can be capable of cleaving over a spacerregion of greater than 24 base pairs upon formation of a transienthomodimer.

In some instances, such enzymes can comprise one or more mutationsrelative to SEQ ID NO: 1-SEQ ID NO: 81 (nucleotide sequences of SEQ IDNO: 82-SEQ ID NO: 162). In some cases, the non-naturally occurringenzymes described herein can comprise about 1, 2, 3, 4, 5, 6, 7, 8, 9,10, or more mutations. A mutation can be engineered to enhance cleavageefficiency. A mutation can abolish cleavage activity. In some cases, amutation can enhance homodimerization. For example, FokI can have amutation at one or more amino acid residue positions 446, 447, 479, 483,484, 486, 487, 490, 491, 496, 498, 499, 500, 531, 534, 537, and 538 tomodulate homodimerization, and similar mutations can be designed basedon the phylogenetic analysis of SEQ ID NO: 1-SEQ ID NO: 81 (nucleotidesequences of SEQ ID NO: 82-SEQ ID NO: 162).

TABLE 7 shows exemplary amino acid sequences (SEQ ID NO: 1-SEQ ID NO:81) of endonucleases for genome editing and the correspondingback-translated nucleic acid sequences (SEQ ID NO: 82-SEQ ID NO: 162) ofthe endonucleases, which were obtained using Geneious software andselecting for human codon optimization.

TABLE 7 Amino Acid and Nucleic Acid Sequences of Endonucleases SEQ SEQID Amino Acid ID NO Sequence NO Back Translated Nucleic Acid Sequences 1FLVKGAMEIKKSEL 82 TTCCTGGTGAAGGGCGCCATGGAGATCAAGAAGAGCGAGCTGARHKLRHVPHEYIELI GGCACAAGCTGAGGCACGTGCCCCACGAGTACATCGAGCTGATEIAQDSKQNRLLEF CGAGATCGCCCAGGACAGCAAGCAGAACAGGCTGCTGGAGTTCKVVEFFKKIYGYRG AAGGTGGTGGAGTTCTTCAAGAAGATCTACGGCTACAGGGGCAKHLGGSRKPDGALF AGCACCTGGGCGGCAGCAGGAAGCCCGACGGCGCCCTGTTCACTDGLVLNHGIILDT CGACGGCCTGGTGCTGAACCACGGCATCATCCTGGACACCAAGGKAYKDGYRLPISQA CCTACAAGGACGGCTACAGGCTGCCCATCAGCCAGGCCGACGA DEMQRYVDENNKRGATGCAGAGGTACGTGGACGAGAACAACAAGAGGAGCCAGGTG SQVINPNEWWEIYPATCAACCCCAACGAGTGGTGGGAGATCTACCCCACCAGCATCAC TSITDFKFLFVSGFFCGACTTCAAGTTCCTGTTCGTGAGCGGCTTCTTCCAGGGCGACT QGDYRKQLERVSHACAGGAAGCAGCTGGAGAGGGTGAGCCACCTGACCAAGTGCCA LTKCQGAVMSVEQGGGCGCCGTGATGAGCGTGGAGCAGCTGCTGCTGGGCGGCGAG LLLGGEKIKEGSLTLAAGATCAAGGAGGGCAGCCTGACCCTGGAGGAGGTGGGCAAGA EEVGKKFKNDEIVFAGTTCAAGAACGACGAGATCGTGTTC 2 QIVKSSIEMSKANM 83CAGATCGTGAAGAGCAGCATCGAGATGAGCAAGGCCAACATGA RDNLQMLPHDYIELGGGACAACCTGCAGATGCTGCCCCACGACTACATCGAGCTGATC IEISQDPYQNRIFEMGAGATCAGCCAGGACCCCTACCAGAACAGGATCTTCGAGATGA KVMDLFINEYGFSGAGGTGATGGACCTGTTCATCAACGAGTACGGCTTCAGCGGCAGC SHLGGSRKPDGAMCACCTGGGCGGCAGCAGGAAGCCCGACGGCGCCATGTACGCCC YAHGFGVIVDTKAACGGCTTCGGCGTGATCGTGGACACCAAGGCCTACAAGGACGG YKDGYNLPISQADECTACAACCTGCCCATCAGCCAGGCCGACGAGATGGAGAGGTAC MERYVRENIDRNEHGTGAGGGAGAACATCGACAGGAACGAGCACGTGAACAGCAACA VNSNRWWNIFPEDTGGTGGTGGAACATCTTCCCCGAGGACACCAACGAGTACAAGTTC NEYKFLFVSGFFKGCTGTTCGTGAGCGGCTTCTTCAAGGGCAACTTCGAGAAGCAGCT NFEKQLERISIDTGVGGAGAGGATCAGCATCGACACCGGCGTGCAGGGCGGCGCCCTG QGGALSVEHLLLGAAGCGTGGAGCACCTGCTGCTGGGCGCCGAGTACATCAAGAGGG EYIKRGILTLYDFKNGCATCCTGACCCTGTACGACTTCAAGAACAGCTTCCTGAACAAG SFLNKEIQF GAGATCCAGTTC 3QTIKSSIEELKSELR 84 CAGACCATCAAGAGCAGCATCGAGGAGCTGAAGAGCGAGCTGATQLNVISHDYLQLV GGACCCAGCTGAACGTGATCAGCCACGACTACCTGCAGCTGGTGDISQDSQQNRLFEM GACATCAGCCAGGACAGCCAGCAGAACAGGCTGTTCGAGATGAKVMDLFINEFGYNG AGGTGATGGACCTGTTCATCAACGAGTTCGGCTACAACGGCAGCSHLGGSRKPDGILY CACCTGGGCGGCAGCAGGAAGCCCGACGGCATCCTGTACACCGTEGLSKDYGIIVDT AGGGCCTGAGCAAGGACTACGGCATCATCGTGGACACCAAGGC KAYKDGYNLPIAQCTACAAGGACGGCTACAACCTGCCCATCGCCCAGGCCGACGAG ADEMERYIRENIDRATGGAGAGGTACATCAGGGAGAACATCGACAGGAACGAGGTGG NEVVNPNRWWEVFTGAACCCCAACAGGTGGTGGGAGGTGTTCCCCAGCAAGATCAA PSKINDYKFLFVSACGACTACAAGTTCCTGTTCGTGAGCGCCTACTTCAAGGGCAACT YFKGNFKEQLERISITCAAGGAGCAGCTGGAGAGGATCAGCATCAACACCGGCATCCT NTGILGGAISVEHLLGGGCGGCGCCATCAGCGTGGAGCACCTGCTGCTGGGCGCCGAG LGAEYFKRGILSLETACTTCAAGAGGGGCATCCTGAGCCTGGAGGACGTGAGGGACA DVRDKFCNTEIEFAGTTCTGCAACACCGAGATCGAGTTC 4 GKSEVETIKEQMRG 85GGCAAGAGCGAGGTGGAGACCATCAAGGAGCAGATGAGGGGC ELTHLSHEYLGLLDGAGCTGACCCACCTGAGCCACGAGTACCTGGGCCTGCTGGACCT LAYDSKQNRLFELKGGCCTACGACAGCAAGCAGAACAGGCTGTTCGAGCTGAAGACC TMQLLTEECGFEGLATGCAGCTGCTGACCGAGGAGTGCGGCTTCGAGGGCCTGCACCT HLGGSRKPDGIVYTGGGCGGCAGCAGGAAGCCCGACGGCATCGTGTACACCAAGGAC KDENEQVGKENYGIGAGAACGAGCAGGTGGGCAAGGAGAACTACGGCATCATCATCG IIDTKAYSGGYSLPIACACCAAGGCCTACAGCGGCGGCTACAGCCTGCCCATCAGCCA SQADEMERYIGENQGGCCGACGAGATGGAGAGGTACATCGGCGAGAACCAGACCAGG TRDIRINPNEWWKNGACATCAGGATCAACCCCAACGAGTGGTGGAAGAACTTCGGCG FGDGVTEYYYLFVACGGCGTGACCGAGTACTACTACCTGTTCGTGGCCGGCCACTTC AGHFKGKYQEQIDRAAGGGCAAGTACCAGGAGCAGATCGACAGGATCAACTGCAACA INCNKNIKGAAVSIAGAACATCAAGGGCGCCGCCGTGAGCATCCAGCAGCTGCTGAG QQLLRIVNDYKAGGATCGTGAACGACTACAAGGCCGGCAAGCTGACCCACGAGGAC KLTHEDMKLKIFHYATGAAGCTGAAGATCTTCCACTAC 5 MKILELLINECGYK 86ATGAAGATCCTGGAGCTGCTGATCAACGAGTGCGGCTACAAGG GLHLGGARKPDGIIGCCTGCACCTGGGCGGCGCCAGGAAGCCCGACGGCATCATCTAC YTEKEKYNYGVIIDACCGAGAAGGAGAAGTACAACTACGGCGTGATCATCGACACCA TKAYSKGYNLPIGQAGGCCTACAGCAAGGGCTACAACCTGCCCATCGGCCAGATCGA IDEMIRYIIENNERNICGAGATGATCAGGTACATCATCGAGAACAACGAGAGGAACATC KRNTNCWWNNFEKAAGAGGAACACCAACTGCTGGTGGAACAACTTCGAGAAGAACG NVNEFYFSFISGEFTTGAACGAGTTCTACTTCAGCTTCATCAGCGGCGAGTTCACCGGC GNIEEKLNRIFISTNIAACATCGAGGAGAAGCTGAACAGGATCTTCATCAGCACCAACA KGNAMSVKTLLYLTCAAGGGCAACGCCATGAGCGTGAAGACCCTGCTGTACCTGGCC ANEIKANRISYIELLAACGAGATCAAGGCCAACAGGATCAGCTACATCGAGCTGCTGA NYFDNKV ACTACTTCGACAACAAGGTG6 AKSSQSETKEKLRE 87 GCCAAGAGCAGCCAGAGCGAGACCAAGGAGAAGCTGAGGGAGKLRNLPHEYLSLVD AAGCTGAGGAACCTGCCCCACGAGTACCTGAGCCTGGTGGACCTLAYDSKQNRLFEM GGCCTACGACAGCAAGCAGAACAGGCTGTTCGAGATGAAGGTG KVIELLTEECGFQGATCGAGCTGCTGACCGAGGAGTGCGGCTTCCAGGGCCTGCACCT LHLGGSRRPDGVLYGGGCGGCAGCAGGAGGCCCGACGGCGTGCTGTACACCGCCGGC TAGLTDNYGIILDTCTGACCGACAACTACGGCATCATCCTGGACACCAAGGCCTACAG KAYSSGYSLPIAQACAGCGGCTACAGCCTGCCCATCGCCCAGGCCGACGAGATGGAG DEMERYVRENQTRAGGTACGTGAGGGAGAACCAGACCAGGGACGAGCTGGTGAACC DELVNPNQWWENFCCAACCAGTGGTGGGAGAACTTCGAGAACGGCCTGGGCACCTTC ENGLGTFYFLFVAGTACTTCCTGTTCGTGGCCGGCCACTTCAACGGCAACGTGCAGGC HFNGNVQAQLERISCCAGCTGGAGAGGATCAGCAGGAACACCGGCGTGCTGGGCGCC RNTGVLGAAASISQGCCGCCAGCATCAGCCAGCTGCTGCTGCTGGCCGACGCCATCAG LLLLADAIRGGRMDGGGCGGCAGGATGGACAGGGAGAGGCTGAGGCACCTGATGTTC RERLRHLMFQNEEFCAGAACGAGGAGTTCCTG L 7 NSEKSEFTQEKDNL 88AACAGCGAGAAGAGCGAGTTCACCCAGGAGAAGGACAACCTGA REKLDTLSHEYLSLGGGAGAAGCTGGACACCCTGAGCCACGAGTACCTGAGCCTGGT VDLAFDSQQNRLFEGGACCTGGCCTTCGACAGCCAGCAGAACAGGCTGTTCGAGATG MKTVELLTKECNYAAGACCGTGGAGCTGCTGACCAAGGAGTGCAACTACAAGGGCG KGVHLGGSRKPDGITGCACCTGGGCGGCAGCAGGAAGCCCGACGGCATCATCTACAC IYTENSTDNYGVIIDCGAGAACAGCACCGACAACTACGGCGTGATCATCGACACCAAG TKAYSNGYNLPISQGCCTACAGCAACGGCTACAACCTGCCCATCAGCCAGGTGGACG VDEMVRYVEENNKAGATGGTGAGGTACGTGGAGGAGAACAACAAGAGGGAGAAGG REKERNSNEWWKEAGAGGAACAGCAACGAGTGGTGGAAGGAGTTCGGCGACAACAT FGDNINKFYFSFISGCAACAAGTTCTACTTCAGCTTCATCAGCGGCAAGTTCATCGGCA KFIGNIEEKLQRITIFACATCGAGGAGAAGCTGCAGAGGATCACCATCTTCACCAACGT TNVYGNAMTIITLLGTACGGCAACGCCATGACCATCATCACCCTGCTGTACCTGGCCA YLANEIKANRLKTMACGAGATCAAGGCCAACAGGCTGAAGACCATGGAGGTGGTGAA EVVKYFDNKVGTACTTCGACAACAAGGTG 8 NLTCSDLTEIKEEVR 89AACCTGACCTGCAGCGACCTGACCGAGATCAAGGAGGAGGTGA NALTHLSHEYLALIGGAACGCCCTGACCCACCTGAGCCACGAGTACCTGGCCCTGATC DLAYDSTQNRLFEGACCTGGCCTACGACAGCACCCAGAACAGGCTGTTCGAGATGA MKTLQLLVEECGYAGACCCTGCAGCTGCTGGTGGAGGAGTGCGGCTACCAGGGCAC QGTHLGGSRKPDGICCACCTGGGCGGCAGCAGGAAGCCCGACGGCATCTGCTACAGC CYSEEAKSEGLEANGAGGAGGCCAAGAGCGAGGGCCTGGAGGCCAACTACGGCATCA YGIIIDTKSYSGGYGTCATCGACACCAAGAGCTACAGCGGCGGCTACGGCCTGCCCATC LPISQADEMERYIREAGCCAGGCCGACGAGATGGAGAGGTACATCAGGGAGAACCAGA NQTRDAEVNRNKWCCAGGGACGCCGAGGTGAACAGGAACAAGTGGTGGGAGGCCTT WEAFPETIDIFYFMFCCCCGAGACCATCGACATCTTCTACTTCATGTTCGTGGCCGGCC VAGHFKGNYFNQLACTTCAAGGGCAACTACTTCAACCAGCTGGAGAGGCTGCAGAG ERLQRSTGIKGAAVGAGCACCGGCATCAAGGGCGCCGCCGTGGACATCAAGACCCTG DIKTLLLTANRCKTCTGCTGACCGCCAACAGGTGCAAGACCGGCGAGCTGGACCACG GELDHAGIESCFFNCCGGCATCGAGAGCTGCTTCTTCAACAACTGCAGGCTG NCRL 9 DNVKSNFNQEKDE 90GACAACGTGAAGAGCAACTTCAACCAGGAGAAGGACGAGCTGA LREKLDTLSHEYLYGGGAGAAGCTGGACACCCTGAGCCACGAGTACCTGTACCTGCTG LLDLAYDSKQNKLFGACCTGGCCTACGACAGCAAGCAGAACAAGCTGTTCGAGATGA EMKILELLINECGYAGATCCTGGAGCTGCTGATCAACGAGTGCGGCTACAGGGGCCTG RGLHLGGVRKPDGICACCTGGGCGGCGTGAGGAAGCCCGACGGCATCATCTACACCG IYTEKEKYNYGVIIDAGAAGGAGAAGTACAACTACGGCGTGATCATCGACACCAAGGC TKAYSKGYNLPIGQCTACAGCAAGGGCTACAACCTGCCCATCGGCCAGATCGACGAG IDEMIRYIIENNERNIATGATCAGGTACATCATCGAGAACAACGAGAGGAACATCAAGA KRNTNCWWNNFEKGGAACACCAACTGCTGGTGGAACAACTTCGAGAAGAACGTGAA NVNEFYFSFISGEFTCGAGTTCTACTTCAGCTTCATCAGCGGCGAGTTCACCGGCAACA GNIEEKLNRIFISTNITCGAGGAGAAGCTGAACAGGATCTTCATCAGCACCAACATCAA KGNAMSVKTLLYLGGGCAACGCCATGAGCGTGAAGACCCTGCTGTACCTGGCCAAC ANEIKANRISFLEMEGAGATCAAGGCCAACAGGATCAGCTTCCTGGAGATGGAGAAGT KYFDNKV ACTTCGACAACAAGGTG 10EGIKSNISLLKDELR 91 GAGGGCATCAAGAGCAACATCAGCCTGCTGAAGGACGAGCTGAGQISHISHEYLSLID GGGGCCAGATCAGCCACATCAGCCACGAGTACCTGAGCCTGATCLAFDSKQNRLFEMK GACCTGGCCTTCGACAGCAAGCAGAACAGGCTGTTCGAGATGAVLELLVNEYGFKGR AGGTGCTGGAGCTGCTGGTGAACGAGTACGGCTTCAAGGGCAGHLGGSRKPDGIVYS GCACCTGGGCGGCAGCAGGAAGCCCGACGGCATCGTGTACAGCTTLEDNFGIIVDTKA ACCACCCTGGAGGACAACTTCGGCATCATCGTGGACACCAAGGCYSEGYSLPISQADE CTACAGCGAGGGCTACAGCCTGCCCATCAGCCAGGCCGACGAG MERYVRENSNRDEATGGAGAGGTACGTGAGGGAGAACAGCAACAGGGACGAGGAG EVNPNKWWENFSEGTGAACCCCAACAAGTGGTGGGAGAACTTCAGCGAGGAGGTGA EVKKYYFVFISGSFAGAAGTACTACTTCGTGTTCATCAGCGGCAGCTTCAAGGGCAAG KGKFEEQLRRLSMTTTCGAGGAGCAGCTGAGGAGGCTGAGCATGACCACCGGCGTGA TGVNGSAVNVVNLACGGCAGCGCCGTGAACGTGGTGAACCTGCTGCTGGGCGCCGA LLGAEKIRSGEMTIEGAAGATCAGGAGCGGCGAGATGACCATCGAGGAGCTGGAGAGG ELERAMFNNSEFIGCCATGTTCAACAACAGCGAGTTCATC 11 ISKTNVLELKDKVR 92ATCAGCAAGACCAACGTGCTGGAGCTGAAGGACAAGGTGAGGG DKLKYVDNRYLALIACAAGCTGAAGTACGTGGACAACAGGTACCTGGCCCTGATCGA DLAYDGTANRDFEICCTGGCCTACGACGGCACCGCCAACAGGGACTTCGAGATCCAG QTIDLLINELKFKGVACCATCGACCTGCTGATCAACGAGCTGAAGTTCAAGGGCGTGAG RLGESRKPDGIISYDGCTGGGCGAGAGCAGGAAGCCCGACGGCATCATCAGCTACGAC INGVIIDNKAYSSGYATCAACGGCGTGATCATCGACAACAAGGCCTACAGCAGCGGCT NLPINQADEMIRYIEACAACCTGCCCATCAACCAGGCCGACGAGATGATCAGGTACATC ENQTRDKKINPNKGAGGAGAACCAGACCAGGGACAAGAAGATCAACCCCAACAAGT WWESFDDKVKDFNGGTGGGAGAGCTTCGACGACAAGGTGAAGGACTTCAACTACCT YLFVSSFFKGNFKNGTTCGTGAGCAGCTTCTTCAAGGGCAACTTCAAGAACAACCTGA NLKHIANRTGVNGAGCACATCGCCAACAGGACCGGCGTGAACGGCGGCGTGATCAA GVINVENLLYFAEECGTGGAGAACCTGCTGTACTTCGCCGAGGAGCTGAAGAGCGGC LKSGRLSYVDLFKMAGGCTGAGCTACGTGGACCTGTTCAAGATGTACGACAACGACG YDNDEINI AGATCAACATC 12ISKTNVLELKDKVR 93 ATCAGCAAGACCAACGTGCTGGAGCTGAAGGACAAGGTGAGGGDKLKYVDHRYLALI ACAAGCTGAAGTACGTGGACCACAGGTACCTGGCCCTGATCGACDLAYDGTANRDFEI CTGGCCTACGACGGCACCGCCAACAGGGACTTCGAGATCCAGAQTIDLLINELKFKGV CCATCGACCTGCTGATCAACGAGCTGAAGTTCAAGGGCGTGAGGRLGESRKPDGIISYD CTGGGCGAGAGCAGGAAGCCCGACGGCATCATCAGCTACGACAINGVIIDNKAYSTGY TCAACGGCGTGATCATCGACAACAAGGCCTACAGCACCGGCTACNLPINQADEMIRYIE AACCTGCCCATCAACCAGGCCGACGAGATGATCAGGTACATCGENQTRDKKINSNK AGGAGAACCAGACCAGGGACAAGAAGATCAACAGCAACAAGT WWESFDDKVKNFNGGTGGGAGAGCTTCGACGACAAGGTGAAGAACTTCAACTACCT YLFVSSFFKGNFKNGTTCGTGAGCAGCTTCTTCAAGGGCAACTTCAAGAACAACCTGA NLKHIANRTGVNGAGCACATCGCCAACAGGACCGGCGTGAACGGCGGCGCCATCAA GAINVENLLYFAEECGTGGAGAACCTGCTGTACTTCGCCGAGGAGCTGAAGGCCGGC LKAGRLSYVDSFTMAGGCTGAGCTACGTGGACAGCTTCACCATGTACGACAACGACG YDNDEIYV AGATCTACGTG 13KAEKSEFLIEKDKL 94 AAGGCCGAGAAGAGCGAGTTCCTGATCGAGAAGGACAAGCTGAREKLDTLPHDYLSM GGGAGAAGCTGGACACCCTGCCCCACGACTACCTGAGCATGGTVDLAYDSKQNRLFE GGACCTGGCCTACGACAGCAAGCAGAACAGGCTGTTCGAGATGMKTIELLINECNYK AAGACCATCGAGCTGCTGATCAACGAGTGCAACTACAAGGGCCGLHLGGTRKPDGIV TGCACCTGGGCGGCACCAGGAAGCCCGACGGCATCGTGTACACYTNNEVENYGIIIDT CAACAACGAGGTGGAGAACTACGGCATCATCATCGACACCAAGKAYSKGYNLPISQV GCCTACAGCAAGGGCTACAACCTGCCCATCAGCCAGGTGGACG DEMTRYVEENNKRAGATGACCAGGTACGTGGAGGAGAACAACAAGAGGGAGAAGA EKKRNPNEWWNNFAGAGGAACCCCAACGAGTGGTGGAACAACTTCGACAGCAACGT DSNVKKFYFSFISGGAAGAAGTTCTACTTCAGCTTCATCAGCGGCAAGTTCGTGGGCA KFVGNIEEKLQRITLACATCGAGGAGAAGCTGCAGAGGATCACCCTGTTCACCGAGAT FTEIYGNAITVTTLLCTACGGCAACGCCATCACCGTGACCACCCTGCTGTACATCGCCA YIANEIKANRMKKSACGAGATCAAGGCCAACAGGATGAAGAAGAGCGACATCATGGA DIMEYFNDKVGTACTTCAACGACAAGGTG 14 ISKTNVLELKDKVR 95ATCAGCAAGACCAACGTGCTGGAGCTGAAGGACAAGGTGAGGG DKLKYVDHRYLALIACAAGCTGAAGTACGTGGACCACAGGTACCTGGCCCTGATCGAC DLAYDGTANRDFEICTGGCCTACGACGGCACCGCCAACAGGGACTTCGAGATCCAGA QTIDLLINELKFKGVCCATCGACCTGCTGATCAACGAGCTGAAGTTCAAGGGCGTGAGG RLGESRKPDGIISYNCTGGGCGAGAGCAGGAAGCCCGACGGCATCATCAGCTACAACA INGVIIDNKAYSTGYTCAACGGCGTGATCATCGACAACAAGGCCTACAGCACCGGCTAC NLPINQADEMIRYIEAACCTGCCCATCAACCAGGCCGACGAGATGATCAGGTACATCG ENQTRDEKINSNKWAGGAGAACCAGACCAGGGACGAGAAGATCAACAGCAACAAGT WESFDDEVKDFNYGGTGGGAGAGCTTCGACGACGAGGTGAAGGACTTCAACTACCT LFVSSFFKGNFKNNGTTCGTGAGCAGCTTCTTCAAGGGCAACTTCAAGAACAACCTGA LKHIANRTGVNGGAGCACATCGCCAACAGGACCGGCGTGAACGGCGGCGCCATCAA AINVENLLYFAEELCGTGGAGAACCTGCTGTACTTCGCCGAGGAGCTGAAGGCCGGC KAGRLSYVDSFTMAGGCTGAGCTACGTGGACAGCTTCACCATGTACGACAACGACG YDNDEIYV AGATCTACGTG 15ISKTNILELKDKVRD 96 ATCAGCAAGACCAACATCCTGGAGCTGAAGGACAAGGTGAGGGKLKYVDHRYLALID ACAAGCTGAAGTACGTGGACCACAGGTACCTGGCCCTGATCGACLAYDGTANRDFEIQ CTGGCCTACGACGGCACCGCCAACAGGGACTTCGAGATCCAGATIDLLINELKFKGVR CCATCGACCTGCTGATCAACGAGCTGAAGTTCAAGGGCGTGAGGLGESRKPDGIISYNI CTGGGCGAGAGCAGGAAGCCCGACGGCATCATCAGCTACAACANGVIIDNKAYSTGY TCAACGGCGTGATCATCGACAACAAGGCCTACAGCACCGGCTACNLPINQADEMIRYIE AACCTGCCCATCAACCAGGCCGACGAGATGATCAGGTACATCGENQTRDEKINSNKW AGGAGAACCAGACCAGGGACGAGAAGATCAACAGCAACAAGT WESFDEKVKDFNYGGTGGGAGAGCTTCGACGAGAAGGTGAAGGACTTCAACTACCT LFVSSFFKGNFKNNGTTCGTGAGCAGCTTCTTCAAGGGCAACTTCAAGAACAACCTGA LKHIANRTGVNGGAGCACATCGCCAACAGGACCGGCGTGAACGGCGGCGCCATCAA AINVENLLYFAEELCGTGGAGAACCTGCTGTACTTCGCCGAGGAGCTGAAGGCCGGC KAGRISYLDSFKMYAGGATCAGCTACCTGGACAGCTTCAAGATGTACAACAACGACG NNDEIYL AGATCTACCTG 16ISKTNVLELKDKVR 97 ATCAGCAAGACCAACGTGCTGGAGCTGAAGGACAAGGTGAGGGDKLKYVDHRYLALI ACAAGCTGAAGTACGTGGACCACAGGTACCTGGCCCTGATCGACDLAYDGTANRDFEI CTGGCCTACGACGGCACCGCCAACAGGGACTTCGAGATCCAGAQTIDLLINELKFKGV CCATCGACCTGCTGATCAACGAGCTGAAGTTCAAGGGCGTGAGGRLGESRKPDGIISYN CTGGGCGAGAGCAGGAAGCCCGACGGCATCATCAGCTACAACAINGVIIDNKAYSTGY TCAACGGCGTGATCATCGACAACAAGGCCTACAGCACCGGCTACNLPINQADEMIRYIE AACCTGCCCATCAACCAGGCCGACGAGATGATCAGGTACATCGENQTRDEKINSNKW AGGAGAACCAGACCAGGGACGAGAAGATCAACAGCAACAAGT WESFDDKVKDFNYGGTGGGAGAGCTTCGACGACAAGGTGAAGGACTTCAACTACCT LFVSSFFKGNFKNNGTTCGTGAGCAGCTTCTTCAAGGGCAACTTCAAGAACAACCTGA LKHIANRTGVSGGAAGCACATCGCCAACAGGACCGGCGTGAGCGGCGGCGCCATCAA INVENLLYFAEELKCGTGGAGAACCTGCTGTACTTCGCCGAGGAGCTGAAGGCCGGC AGRLSYVDSFKMYAGGCTGAGCTACGTGGACAGCTTCAAGATGTACGACAACGACG DNDEIYV AGATCTACGTG 17ISKTNVLELKDKVR 98 ATCAGCAAGACCAACGTGCTGGAGCTGAAGGACAAGGTGAGGANKLKYVDHRYLALI ACAAGCTGAAGTACGTGGACCACAGGTACCTGGCCCTGATCGACDLAYDGTANRDFEI CTGGCCTACGACGGCACCGCCAACAGGGACTTCGAGATCCAGAQTIDLLINELKFKGV CCATCGACCTGCTGATCAACGAGCTGAAGTTCAAGGGCGTGAGGRLGESRKPDGIISYD CTGGGCGAGAGCAGGAAGCCCGACGGCATCATCAGCTACGACAINGVIIDNKSYSTGY TCAACGGCGTGATCATCGACAACAAGAGCTACAGCACCGGCTANLPINQADEMIRYIE CAACCTGCCCATCAACCAGGCCGACGAGATGATCAGGTACATCGENQTRDEKINSNKW AGGAGAACCAGACCAGGGACGAGAAGATCAACAGCAACAAGT WESFDEKVKDFNYGGTGGGAGAGCTTCGACGAGAAGGTGAAGGACTTCAACTACCT LFVSSFFKGNFKNNGTTCGTGAGCAGCTTCTTCAAGGGCAACTTCAAGAACAACCTGA LKHIANRTGVNGGAGCACATCGCCAACAGGACCGGCGTGAACGGCGGCGCCATCAA AINVENLLYFAEELCGTGGAGAACCTGCTGTACTTCGCCGAGGAGCTGAAGAGCGGC KSGRLSYVDSFTMYAGGCTGAGCTACGTGGACAGCTTCACCATGTACGACAACGACG DNDEIYV AGATCTACGTG 18ISKTNVLELKDKVR 99 ATCAGCAAGACCAACGTGCTGGAGCTGAAGGACAAGGTGAGGGDKLKYVDHRYLSLI ACAAGCTGAAGTACGTGGACCACAGGTACCTGAGCCTGATCGADLAYDGNANRDFEI CCTGGCCTACGACGGCAACGCCAACAGGGACTTCGAGATCCAGQTIDLLINELNFKGV ACCATCGACCTGCTGATCAACGAGCTGAACTTCAAGGGCGTGAGRLGESRKPDGIISYN GCTGGGCGAGAGCAGGAAGCCCGACGGCATCATCAGCTACAACINGVIIDNKAYSTGY ATCAACGGCGTGATCATCGACAACAAGGCCTACAGCACCGGCTNLPINQADEMIRYIE ACAACCTGCCCATCAACCAGGCCGACGAGATGATCAGGTACATCENQTRDEKINSNKW GAGGAGAACCAGACCAGGGACGAGAAGATCAACAGCAACAAG WESFDDKVKDFNYTGGTGGGAGAGCTTCGACGACAAGGTGAAGGACTTCAACTACCT LFVSSFFKGNFKNNGTTCGTGAGCAGCTTCTTCAAGGGCAACTTCAAGAACAACCTGA LKHIANRTGVSGGAAGCACATCGCCAACAGGACCGGCGTGAGCGGCGGCGCCATCAA INVENLLYFAEELKCGTGGAGAACCTGCTGTACTTCGCCGAGGAGCTGAAGGCCGGC AGRLSYADSFTMYAGGCTGAGCTACGCCGACAGCTTCACCATGTACGACAACGACG DNDEIYV AGATCTACGTG 19IAKTNVLGLKDKVR 100 ATCGCCAAGACCAACGTGCTGGGCCTGAAGGACAAGGTGAGGGDRLKYVDHRYLALI ACAGGCTGAAGTACGTGGACCACAGGTACCTGGCCCTGATCGACDLAYDGTANRDFEI CTGGCCTACGACGGCACCGCCAACAGGGACTTCGAGATCCAGAQTIDLLINELKFKGV CCATCGACCTGCTGATCAACGAGCTGAAGTTCAAGGGCGTGAGGRLGESRKPDGIISYN CTGGGCGAGAGCAGGAAGCCCGACGGCATCATCAGCTACAACGVNGVIIDNKAYSKG TGAACGGCGTGATCATCGACAACAAGGCCTACAGCAAGGGCTAYNLPINQADEMIRYI CAACCTGCCCATCAACCAGGCCGACGAGATGATCAGGTACATCGEENQTRDEKINANK AGGAGAACCAGACCAGGGACGAGAAGATCAACGCCAACAAGTG WWESFDDKVEEFSGTGGGAGAGCTTCGACGACAAGGTGGAGGAGTTCAGCTACCTG YLFVSSFFKGNFKNTTCGTGAGCAGCTTCTTCAAGGGCAACTTCAAGAACAACCTGAA NLKHIANRTGVNGGCACATCGCCAACAGGACCGGCGTGAACGGCGGCGCCATCAAC GAINVENLLYFAEEGTGGAGAACCTGCTGTACTTCGCCGAGGAGCTGAAGAGCGGCA LKSGRLSYMDSFSLGGCTGAGCTACATGGACAGCTTCAGCCTGTACGACAACGACGA YDNDEICV GATCTGCGTG 20ELKDEQSEKRKAKF 101 GAGCTGAAGGACGAGCAGAGCGAGAAGAGGAAGGCCAAGTTCCLKETKLPMKYIELL TGAAGGAGACCAAGCTGCCCATGAAGTACATCGAGCTGCTGGADIAYDGKRNRDFEI CATCGCCTACGACGGCAAGAGGAACAGGGACTTCGAGATCGTGVTMELFREVYRLNS ACCATGGAGCTGTTCAGGGAGGTGTACAGGCTGAACAGCAAGCKLLGGGRKPDGLIY TGCTGGGCGGCGGCAGGAAGCCCGACGGCCTGATCTACACCGATDDFGVIVDTKAYG CGACTTCGGCGTGATCGTGGACACCAAGGCCTACGGCGAGGGCTEGYSKSINQADEMI ACAGCAAGAGCATCAACCAGGCCGACGAGATGATCAGGTACATRYIEDNKRRDEKRN CGAGGACAACAAGAGGAGGGACGAGAAGAGGAACCCCATCAAPIKWWESFPSSISQN GTGGTGGGAGAGCTTCCCCAGCAGCATCAGCCAGAACAACTTCTNFYFLWVSSKFVGK ACTTCCTGTGGGTGAGCAGCAAGTTCGTGGGCAAGTTCCAGGAGFQEQLAYTANETQT CAGCTGGCCTACACCGCCAACGAGACCCAGACCAAGGGCGGCGKGGAINVEQILIGA CCATCAACGTGGAGCAGATCCTGATCGGCGCCGACCTGATCATGDLIMQKMLDINTIPS CAGAAGATGCTGGACATCAACACCATCCCCAGCTTCTTCGAGAA PPENQEIIFCCAGGAGATCATCTTC 21 IFKTNVLELKDSIRE 102ATCTTCAAGACCAACGTGCTGGAGCTGAAGGACAGCATCAGGG KLDYIDHRYLSLVDAGAAGCTGGACTACATCGACCACAGGTACCTGAGCCTGGTGGA LAYDSKANRDFEIQCCTGGCCTACGACAGCAAGGCCAACAGGGACTTCGAGATCCAG TIDLLINELDFKGLRACCATCGACCTGCTGATCAACGAGCTGGACTTCAAGGGCCTGAG LGESRKPDGIISYDIGCTGGGCGAGAGCAGGAAGCCCGACGGCATCATCAGCTACGAC NGVIIDNKAYSKGYATCAACGGCGTGATCATCGACAACAAGGCCTACAGCAAGGGCT NLPINQADEMIRYIQACAACCTGCCCATCAACCAGGCCGACGAGATGATCAGGTACATC ENQSRNEKINPNKWCAGGAGAACCAGAGCAGGAACGAGAAGATCAACCCCAACAAGT WENFEDKVIKFNYLGGTGGGAGAACTTCGAGGACAAGGTGATCAAGTTCAACTACCT FISSLFVGGFKKNLQGTTCATCAGCAGCCTGTTCGTGGGCGGCTTCAAGAAGAACCTGC HIANRTGVNGGAIDAGCACATCGCCAACAGGACCGGCGTGAACGGCGGCGCCATCGA VENLLYFAEEIKSGCGTGGAGAACCTGCTGTACTTCGCCGAGGAGATCAAGAGCGGC RLTYKDSFSRYINDAGGCTGACCTACAAGGACAGCTTCAGCAGGTACATCAACGACG EIKM AGATCAAGATG 22LPVKSEVSVFKDYL 103 CTGCCCGTGAAGAGCGAGGTGAGCGTGTTCAAGGACTACCTGARTHLTHVDHRYLIL GGACCCACCTGACCCACGTGGACCACAGGTACCTGATCCTGGTGVDLGFDGSSDRDYE GACCTGGGCTTCGACGGCAGCAGCGACAGGGACTACGAGATGA MKTAELFTAELGFAGACCGCCGAGCTGTTCACCGCCGAGCTGGGCTTCATGGGCGCC MGARLGDTRKPDVAGGCTGGGCGACACCAGGAAGCCCGACGTGTGCGTGTACCACG CVYHGANGLIIDNKGCGCCAACGGCCTGATCATCGACAACAAGGCCTACGGCAAGGG AYGKGYSLPIKQADCTACAGCCTGCCCATCAAGCAGGCCGACGAGATCTACAGGTACA EIYRYIEENKERDATCGAGGAGAACAAGGAGAGGGACGCCAGGCTGAACCCCAACCA RLNPNQWWKVFDEGTGGTGGAAGGTGTTCGACGAGAGCGTGACCCACTTCAGGTTCG SVTHFRFAFISGSFTCCTTCATCAGCGGCAGCTTCACCGGCGGCTTCAAGGACAGGATC GGFKDRIELISMRSGGAGCTGATCAGCATGAGGAGCGGCATCTGCGGCGCCGCCGTGA ICGAAVNSVNLLLMACAGCGTGAACCTGCTGCTGATGGCCGAGGAGCTGAAGAGCGG AEELKSGRLDYEECAGGCTGGACTACGAGGAGTGGTTCCAGTACTTCGACTGCAACG WFQYFDCNDEISFACGAGATCAGCTTC 23 ISVKSDMAVVKDSV 104ATCAGCGTGAAGAGCGACATGGCCGTGGTGAAGGACAGCGTGA RERLAHVSHEYLILIGGGAGAGGCTGGCCCACGTGAGCCACGAGTACCTGATCCTGATC DLGFDGTSDRDYEIGACCTGGGCTTCGACGGCACCAGCGACAGGGACTACGAGATCC QTAELFTRELDFLGAGACCGCCGAGCTGTTCACCAGGGAGCTGGACTTCCTGGGCGGC GRLGDTRKPDVCIYAGGCTGGGCGACACCAGGAAGCCCGACGTGTGCATCTACTACG YGKDGMIIDNKAYGCAAGGACGGCATGATCATCGACAACAAGGCCTACGGCAAGGG GKGYSLPIKQADEMCTACAGCCTGCCCATCAAGCAGGCCGACGAGATGTACAGGTACC YRYLEENKERNEKITGGAGGAGAACAAGGAGAGGAACGAGAAGATCAACCCCAACA NPNRWWKVFDEGVGGTGGTGGAAGGTGTTCGACGAGGGCGTGACCGACTACAGGTT TDYRFAFVSGSFTGCGCCTTCGTGAGCGGCAGCTTCACCGGCGGCTTCAAGGACAGGC GFKDRLENIHMRSGTGGAGAACATCCACATGAGGAGCGGCCTGTGCGGCGGCGCCAT LCGGAIDSVTLLLLCGACAGCGTGACCCTGCTGCTGCTGGCCGAGGAGCTGAAGGCC AEELKAGRMEYSEFGGCAGGATGGAGTACAGCGAGTTCTTCAGGCTGTTCGACTGCAA FRLFDCNDEVTFCGACGAGGTGACCTTC 24 ELKDKAADAVKAK 105GAGCTGAAGGACAAGGCCGCCGACGCCGTGAAGGCCAAGTTCC FLKLTGLSMKYIELTGAAGCTGACCGGCCTGAGCATGAAGTACATCGAGCTGCTGGAC LDIAYDSSRNRDFEIATCGCCTACGACAGCAGCAGGAACAGGGACTTCGAGATCCTGA LTADLFKNVYGLDCCGCCGACCTGTTCAAGAACGTGTACGGCCTGGACGCCATGCAC AMHLGGGRKPDAICTGGGCGGCGGCAGGAAGCCCGACGCCATCGCCCAGACCAGCC AQTSHFGIIIDTKAYACTTCGGCATCATCATCGACACCAAGGCCTACGGCAACGGCTAC GNGYSKSISQEDEMAGCAAGAGCATCAGCCAGGAGGACGAGATGGTGAGGTACATCG VRYIEDNQQRSITRAGGACAACCAGCAGAGGAGCATCACCAGGAACAGCGTGGAGTG NSVEWWKNFNSSIPGTGGAAGAACTTCAACAGCAGCATCCCCAGCACCGCCTTCTACT STAFYFLWVSSKFVTCCTGTGGGTGAGCAGCAAGTTCGTGGGCAAGTTCGACGACCAG GKFDDQLLATYNRCTGCTGGCCACCTACAACAGGACCAACACCTGCGGCGGCGCCCT TNTCGGALNVEQLLGAACGTGGAGCAGCTGCTGATCGGCGCCTACAAGGTGAAGGCC IGAYKVKAGLLGIGGGCCTGCTGGGCATCGGCCAGATCCCCAGCTACTTCAAGAACAA QIPSYFKNKEIAW GGAGATCGCCTGG25 ISVKSDMAVVKDSV 106 ATCAGCGTGAAGAGCGACATGGCCGTGGTGAAGGACAGCGTGARERLAHVSHEYLLL GGGAGAGGCTGGCCCACGTGAGCCACGAGTACCTGCTGCTGATCIDLGFDGTSDRDYEI GACCTGGGCTTCGACGGCACCAGCGACAGGGACTACGAGATCCQTAELLTRELDFLG AGACCGCCGAGCTGCTGACCAGGGAGCTGGACTTCCTGGGCGGGRLGDTRKPDVCIY CAGGCTGGGCGACACCAGGAAGCCCGACGTGTGCATCTACTAC YGKDGMIIDNKAYGGCAAGGACGGCATGATCATCGACAACAAGGCCTACGGCAAGG GKGYSLPIKQADEMGCTACAGCCTGCCCATCAAGCAGGCCGACGAGATGTACAGGTA YRYLEENKERNEKICCTGGAGGAGAACAAGGAGAGGAACGAGAAGATCAACCCCAAC NPNRWWKVFDEGVAGGTGGTGGAAGGTGTTCGACGAGGGCGTGACCGACTACAGGT TDYRFAFVSGSFTGTCGCCTTCGTGAGCGGCAGCTTCACCGGCGGCTTCAAGGACAGG GFKDRLENIHMRSGCTGGAGAACATCCACATGAGGAGCGGCCTGTGCGGCGGCGCCA LCGGAIDSVTLLLLTCGACAGCGTGACCCTGCTGCTGCTGGCCGAGGAGCTGAAGGCC AEELKAGRMEYSEFGGCAGGATGGAGTACAGCGAGTTCTTCAGGCTGTTCGACTGCAA FRLFDCNDEVTFCGACGAGGTGACCTTC 26 ELKDEQAEKRKAK 107GAGCTGAAGGACGAGCAGGCCGAGAAGAGGAAGGCCAAGTTCC FLKETNLPMKYIELTGAAGGAGACCAACCTGCCCATGAAGTACATCGAGCTGCTGGA LDIAYDGKRNRDFECATCGCCTACGACGGCAAGAGGAACAGGGACTTCGAGATCGTG IVTMELFRNVYRLHACCATGGAGCTGTTCAGGAACGTGTACAGGCTGCACAGCAAGCT SKLLGGGRKPDGLLGCTGGGCGGCGGCAGGAAGCCCGACGGCCTGCTGTACCAGGAC YQDRFGVIVDTKAYAGGTTCGGCGTGATCGTGGACACCAAGGCCTACGGCAAGGGCT GKGYSKSINQADEACAGCAAGAGCATCAACCAGGCCGACGAGATGATCAGGTACAT MIRYIEDNKRRDENCGAGGACAACAAGAGGAGGGACGAGAACAGGAACCCCATCAA RNPIKWWEAFPDTIGTGGTGGGAGGCCTTCCCCGACACCATCCCCCAGGAGGAGTTCT PQEEFYFMWVSSKFACTTCATGTGGGTGAGCAGCAAGTTCATCGGCAAGTTCCAGGAG IGKFQEQLDYTSNECAGCTGGACTACACCAGCAACGAGACCCAGATCAAGGGCGCCG TQIKGAALNVEQLLCCCTGAACGTGGAGCAGCTGCTGCTGGGCGCCGACCTGGTGCTG LGADLVLKGQLHISAAGGGCCAGCTGCACATCAGCGACCTGCCCAGCTACTTCCAGAA DLPSYFQNKEIEFCAAGGAGATCGAGTTC 27 RNLDNVERDNRKA 108AGGAACCTGGACAACGTGGAGAGGGACAACAGGAAGGCCGAGT EFLAKTSLPPRFIELTCCTGGCCAAGACCAGCCTGCCCCCCAGGTTCATCGAGCTGCTG LSIAYESKSNRDFEAGCATCGCCTACGAGAGCAAGAGCAACAGGGACTTCGAGATGA MITAELFKDVYGLGTCACCGCCGAGCTGTTCAAGGACGTGTACGGCCTGGGCGCCGTG AVHLGNAKKPDALCACCTGGGCAACGCCAAGAAGCCCGACGCCCTGGCCTTCAACG AFNDDFGIIIDTKAYACGACTTCGGCATCATCATCGACACCAAGGCCTACAGCAACGGC SNGYSKNINQEDEMTACAGCAAGAACATCAACCAGGAGGACGAGATGGTGAGGTACA VRYIEDNQIRSPDRTCGAGGACAACCAGATCAGGAGCCCCGACAGGAACAACAACGA NNNEWWLSFPPSIPGTGGTGGCTGAGCTTCCCCCCCAGCATCCCCGAGAACGACTTCC ENDFHFLWVSSYFTACTTCCTGTGGGTGAGCAGCTACTTCACCGGCAGGTTCGAGGAG GRFEEQLQETSARTCAGCTGCAGGAGACCAGCGCCAGGACCGGCGGCACCACCGGCG GGTTGGALDVEQLGCGCCCTGGACGTGGAGCAGCTGCTGATCGGCGGCAGCCTGATC LIGGSLIQEGSLAPHCAGGAGGGCAGCCTGGCCCCCCACGAGGTGCCCGCCTACATGC EVPAYMQNRVIHFAGAACAGGGTGATCCACTTC 28 SPVKSEVSVFKDYL 109AGCCCCGTGAAGAGCGAGGTGAGCGTGTTCAAGGACTACCTGA RTHLTHVDHRYLILGGACCCACCTGACCCACGTGGACCACAGGTACCTGATCCTGGTG VDLGFDGSSDRDYEGACCTGGGCTTCGACGGCAGCAGCGACAGGGACTACGAGATGA MKTAELFTAELGFAGACCGCCGAGCTGTTCACCGCCGAGCTGGGCTTCATGGGCGCC MGARLGDTRKPDVAGGCTGGGCGACACCAGGAAGCCCGACGTGTGCGTGTACCACG CVYHGAHGLIIDNKGCGCCCACGGCCTGATCATCGACAACAAGGCCTACGGCAAGGG AYGKGYSLPIKQADCTACAGCCTGCCCATCAAGCAGGCCGACGAGATCTACAGGTACA EIYRYIEENKERAVTCGAGGAGAACAAGGAGAGGGCCGTGAGGCTGAACCCCAACCA RLNPNQWWKVFDEGTGGTGGAAGGTGTTCGACGAGAGCGTGGCCCACTTCAGGTTCG SVAHFRFAFISGSFTCCTTCATCAGCGGCAGCTTCACCGGCGGCTTCAAGGACAGGATC GGFKDRIELISMRSGGAGCTGATCAGCATGAGGAGCGGCATCTGCGGCGCCGCCGTGA ICGAAVNSVNLLLMACAGCGTGAACCTGCTGCTGATGGCCGAGGAGCTGAAGAGCGG AEELKSGRLNYEECAGGCTGAACTACGAGGAGTGGTTCCAGTACTTCGACTGCAACG WFQYFDCNDEISLACGAGATCAGCCTG 29 TLVDIEKERKKAYF 110ACCCTGGTGGACATCGAGAAGGAGAGGAAGAAGGCCTACTTCC LKETSLSPRYIELLEITGAAGGAGACCAGCCTGAGCCCCAGGTACATCGAGCTGCTGGA AFDPKRNRDFEVITGATCGCCTTCGACCCCAAGAGGAACAGGGACTTCGAGGTGATC AELLKAGYGLKAKACCGCCGAGCTGCTGAAGGCCGGCTACGGCCTGAAGGCCAAGG VLGGGRRPDGIAYTTGCTGGGCGGCGGCAGGAGGCCCGACGGCATCGCCTACACCAA KDYGLIVDTKAYSNGGACTACGGCCTGATCGTGGACACCAAGGCCTACAGCAACGGC GYGKNIGQADEMIRTACGGCAAGAACATCGGCCAGGCCGACGAGATGATCAGGTACA YIEDNQKRDNKRNPTCGAGGACAACCAGAAGAGGGACAACAAGAGGAACCCCATCGA IEWWREFEVQIPANGTGGTGGAGGGAGTTCGAGGTGCAGATCCCCGCCAACAGCTACT SYYYLWVSGRFTGACTACCTGTGGGTGAGCGGCAGGTTCACCGGCAGGTTCGACGAG RFDEQLVYTSSQTNCAGCTGGTGTACACCAGCAGCCAGACCAACACCAGGGGCGGCG TRGGALEVEQLLWCCCTGGAGGTGGAGCAGCTGCTGTGGGGCGCCGACGCCGTGAT GADAVMKGKLNVSGAAGGGCAAGCTGAACGTGAGCGACCTGCCCAAGTACATGAAC DLPKYMNNSIIKLAACAGCATCATCAAGCTG 30 ELRDKVIEEQKAIFL 111GAGCTGAGGGACAAGGTGATCGAGGAGCAGAAGGCCATCTTCC QKTKLPLSYIELLEITGCAGAAGACCAAGCTGCCCCTGAGCTACATCGAGCTGCTGGAG ARDGKRSRDFELITIATCGCCAGGGACGGCAAGAGGAGCAGGGACTTCGAGCTGATCA ELFKNIYKINARILGCCATCGAGCTGTTCAAGAACATCTACAAGATCAACGCCAGGATC GARKPDGVLYMPECTGGGCGGCGCCAGGAAGCCCGACGGCGTGCTGTACATGCCCG FGVIVDTKAYADGAGTTCGGCGTGATCGTGGACACCAAGGCCTACGCCGACGGCTAC YSKSIAQADEMIRYIAGCAAGAGCATCGCCCAGGCCGACGAGATGATCAGGTACATCG EDNKRRDPSRNSTKAGGACAACAAGAGGAGGGACCCCAGCAGGAACAGCACCAAGT WWEHFPTSIPANNFGGTGGGAGCACTTCCCCACCAGCATCCCCGCCAACAACTTCTAC YFLWVSSVFVNKFHTTCCTGTGGGTGAGCAGCGTGTTCGTGAACAAGTTCCACGAGCA EQLSYTAQETQTVGGCTGAGCTACACCGCCCAGGAGACCCAGACCGTGGGCGCCGCC AALSVEQLLLGADSCTGAGCGTGGAGCAGCTGCTGCTGGGCGCCGACAGCGTGCTGA VLKGNLTTEKFIDSFAGGGCAACCTGACCACCGAGAAGTTCATCGACAGCTTCAAGAA KNQEIVF CCAGGAGATCGTGTTC 31GATKSDLSLLKDDI 112 GGCGCCACCAAGAGCGACCTGAGCCTGCTGAAGGACGACATCARKKLNHINHKYLVL GGAAGAAGCTGAACCACATCAACCACAAGTACCTGGTGCTGATIDLGFDGTADRDYE CGACCTGGGCTTCGACGGCACCGCCGACAGGGACTACGAGCTGLQTADLLTSELAFK CAGACCGCCGACCTGCTGACCAGCGAGCTGGCCTTCAAGGGCGCGARLGDSRKPDVC CAGGCTGGGCGACAGCAGGAAGCCCGACGTGTGCGTGTACCAC VYHDKNGLIIDNKAGACAAGAACGGCCTGATCATCGACAACAAGGCCTACGGCAGCG YGSGYSLPIKQADEGCTACAGCCTGCCCATCAAGCAGGCCGACGAGATGCTGAGGTA MLRYIEENQKRDKCATCGAGGAGAACCAGAAGAGGGACAAGGCCCTGAACCCCAAC ALNPNEWWTIFDDGAGTGGTGGACCATCTTCGACGACGCCGTGAGCAAGTTCAACTT AVSKFNFAFVSGEFCGCCTTCGTGAGCGGCGAGTTCACCGGCGGCTTCAAGGACAGGC TGGFKDRLENISRRTGGAGAACATCAGCAGGAGGAGCTACACCAACGGCGCCGCCAT SYTNGAAINSVNLLCAACAGCGTGAACCTGCTGCTGCTGGCCGAGGAGATCAAGAGC LLAEEIKSGRISYGDGGCAGGATCAGCTACGGCGACGCCTTCACCAAGTTCGAGTGCAA AFTKFECNDEIIICGACGAGATCATCATC 32 ELRNAALDKQKVN 113GAGCTGAGGAACGCCGCCCTGGACAAGCAGAAGGTGAACTTCA FINKTGLPMKYIELLTCAACAAGACCGGCCTGCCCATGAAGTACATCGAGCTGCTGGAG EIAFDGSRNRDFEMATCGCCTTCGACGGCAGCAGGAACAGGGACTTCGAGATGGTGA VTADLFKNVYGFNSCCGCCGACCTGTTCAAGAACGTGTACGGCTTCAACAGCATCCTG ILLGGGRKPDGLIFTCTGGGCGGCGGCAGGAAGCCCGACGGCCTGATCTTCACCGACA DRFGVIIDTKAYGNGGTTCGGCGTGATCATCGACACCAAGGCCTACGGCAACGGCTAC GYSKSIGQEDEMVRAGCAAGAGCATCGGCCAGGAGGACGAGATGGTGAGGTACATCG YIEDNQLRDSNRNSAGGACAACCAGCTGAGGGACAGCAACAGGAACAGCGTGGAGTG VEWWKNFDEKIESEGTGGAAGAACTTCGACGAGAAGATCGAGAGCGAGAACTTCTAC NFYFMWISSKFIGQTTCATGTGGATCAGCAGCAAGTTCATCGGCCAGTTCAGCGACCA FSDQLQSTSDRTNTGCTGCAGAGCACCAGCGACAGGACCAACACCAAGGGCGCCGCC KGAALNVEQLLLGCTGAACGTGGAGCAGCTGCTGCTGGGCGCCGCCGCCGCCAGGG AAAARDGKLDINSLACGGCAAGCTGGACATCAACAGCCTGCCCATCTACATGAACAAC PIYMNNKEILW AAGGAGATCCTGTGG33 ELKDEQSEKRKAYF 114 GAGCTGAAGGACGAGCAGAGCGAGAAGAGGAAGGCCTACTTCCLKETNLPLKYIELLD TGAAGGAGACCAACCTGCCCCTGAAGTACATCGAGCTGCTGGACIAYDGKRNRDFEIV ATCGCCTACGACGGCAAGAGGAACAGGGACTTCGAGATCGTGATMELFRNVYRLQSK CCATGGAGCTGTTCAGGAACGTGTACAGGCTGCAGAGCAAGCT LLGGVRKPDGLLYGCTGGGCGGCGTGAGGAAGCCCGACGGCCTGCTGTACAAGCAC KHRFGIIVDTKAYGAGGTTCGGCATCATCGTGGACACCAAGGCCTACGGCGAGGGCT EGYSKSISQADEMIACAGCAAGAGCATCAGCCAGGCCGACGAGATGATCAGGTACAT RYIEDNKRRDENRNCGAGGACAACAAGAGGAGGGACGAGAACAGGAACAGCACCAA STKWWEHFPDCIPKGTGGTGGGAGCACTTCCCCGACTGCATCCCCAAGCAGAGCTTCT QSFYFMWVSSKFVACTTCATGTGGGTGAGCAGCAAGTTCGTGGGCAAGTTCCAGGAG GKFQEQLDYTANETCAGCTGGACTACACCGCCAACGAGACCAAGACCAACGGCGCCG KTNGAALNVEQLLCCCTGAACGTGGAGCAGCTGCTGTGGGGCGCCGACCTGGTGGCC WGADLVAKGKLDIAAGGGCAAGCTGGACATCAGCCAGCTGCCCAGCTACTTCCAGA SQLPSYFQNKEIEFACAAGGAGATCGAGTTC 34 HNNKFKNYLRENSE 115CACAACAACAAGTTCAAGAACTACCTGAGGGAGAACAGCGAGC LSFKFIELIDIAYDGTGAGCTTCAAGTTCATCGAGCTGATCGACATCGCCTACGACGGC NRNRDMEIITAELLAACAGGAACAGGGACATGGAGATCATCACCGCCGAGCTGCTGA KEIYGLNVKLLGGGAGGAGATCTACGGCCTGAACGTGAAGCTGCTGGGCGGCGGCAG RKPDILAYTDDIGIIIGAAGCCCGACATCCTGGCCTACACCGACGACATCGGCATCATCA DTKAYKDGYGKQITCGACACCAAGGCCTACAAGGACGGCTACGGCAAGCAGATCAA NQADEMIRYIEDNQCCAGGCCGACGAGATGATCAGGTACATCGAGGACAACCAGAGG RRDLIRNPNEWWRAGGGACCTGATCAGGAACCCCAACGAGTGGTGGAGGTACTTCC YFPKSISKEKIYFMCCAAGAGCATCAGCAAGGAGAAGATCTACTTCATGTGGATCAG WISSYFKNNFYEQVCAGCTACTTCAAGAACAACTTCTACGAGCAGGTGCAGTACACCG QYTAQETKSIGAALCCCAGGAGACCAAGAGCATCGGCGCCGCCCTGAACGTGAGGCA NVRQLLLCADAIQKGCTGCTGCTGTGCGCCGACGCCATCCAGAAGGAGGTGCTGAGCC EVLSLDTFLGSFRNTGGACACCTTCCTGGGCAGCTTCAGGAACGAGGAGATCAACCTG EEINL 35 LPVKSEVSILKDYL 116CTGCCCGTGAAGAGCGAGGTGAGCATCCTGAAGGACTACCTGA RSHLTHIDHKYLILVGGAGCCACCTGACCCACATCGACCACAAGTACCTGATCCTGGTG DLGYDGTSDRDYEIGACCTGGGCTACGACGGCACCAGCGACAGGGACTACGAGATCC QTAQLLTAELSFLGAGACCGCCCAGCTGCTGACCGCCGAGCTGAGCTTCCTGGGCGGC GRLGDTRKPDVCIYAGGCTGGGCGACACCAGGAAGCCCGACGTGTGCATCTACTACG YEDNGLIIDNKAYGAGGACAACGGCCTGATCATCGACAACAAGGCCTACGGCAAGGG KGYSLPMKQADEMCTACAGCCTGCCCATGAAGCAGGCCGACGAGATGTACAGGTAC YRYIEENKERSELLATCGAGGAGAACAAGGAGAGGAGCGAGCTGCTGAACCCCAACT NPNCWWNIFDKDVGCTGGTGGAACATCTTCGACAAGGACGTGAAGACCTTCCACTTC KTFHFAFLSGEFTGGCCTTCCTGAGCGGCGAGTTCACCGGCGGCTTCAGGGACAGGCT GFRDRLNHISMRSGGAACCACATCAGCATGAGGAGCGGCATGAGGGGCGCCGCCGTG MRGAAVNSANLLIAACAGCGCCAACCTGCTGATCATGGCCGAGAAGCTGAAGGCCG MAEKLKAGTMEYEGCACCATGGAGTACGAGGAGTTCTTCAGGCTGTTCGACACCAAC EFFRLFDTNDEILFGACGAGATCCTGTTC 36 LPVKSQVSILKDYL 117CTGCCCGTGAAGAGCCAGGTGAGCATCCTGAAGGACTACCTGA RSYLSHVDHKYLILGGAGCTACCTGAGCCACGTGGACCACAAGTACCTGATCCTGCTG LDLGFDGTSDRDYEGACCTGGGCTTCGACGGCACCAGCGACAGGGACTACGAGATCT IWTAQLLTAELSFLGGACCGCCCAGCTGCTGACCGCCGAGCTGAGCTTCCTGGGCGGC GGRLGDTRKPDVCIAGGCTGGGCGACACCAGGAAGCCCGACGTGTGCATCTACTACG YYEDNGLIIDNKAYAGGACAACGGCCTGATCATCGACAACAAGGCCTACGGCAAGGG GKGYSLPIKQADEMCTACAGCCTGCCCATCAAGCAGGCCGACGAGATGTACAGGTAC YRYIEENKERSDLLATCGAGGAGAACAAGGAGAGGAGCGACCTGCTGAACCCCAACT NPNCWWNIFGEGVGCTGGTGGAACATCTTCGGCGAGGGCGTGAAGACCTTCAGGTTC KTFRFAFLSGEFTGGCCTTCCTGAGCGGCGAGTTCACCGGCGGCTTCAAGGACAGGCT GFKDRLNHISMRSGGAACCACATCAGCATGAGGAGCGGCATCAAGGGCGCCGCCGTG IKGAAVNSANLLIMAACAGCGCCAACCTGCTGATCATGGCCGAGCAGCTGAAGAGCG AEQLKSGTMSYEEFGCACCATGAGCTACGAGGAGTTCTTCCAGCTGTTCGACTACAAC FQLFDYNDEIIFGACGAGATCATCTTC 37 VSKTNILELKDNTR 118GTGAGCAAGACCAACATCCTGGAGCTGAAGGACAACACCAGGG EKLVYLDHRYLSLFAGAAGCTGGTGTACCTGGACCACAGGTACCTGAGCCTGTTCGAC DLAYDDKASRDFEICTGGCCTACGACGACAAGGCCAGCAGGGACTTCGAGATCCAGA QTIDLLINELQFKGLCCATCGACCTGCTGATCAACGAGCTGCAGTTCAAGGGCCTGAGG RLGERRKPDGIISYGCTGGGCGAGAGGAGGAAGCCCGACGGCATCATCAGCTACGGCG VNGVIIDNKAYSKGTGAACGGCGTGATCATCGACAACAAGGCCTACAGCAAGGGCTA YNLPIRQADEMIRYICAACCTGCCCATCAGGCAGGCCGACGAGATGATCAGGTACATCC QENQSRDEKLNPNKAGGAGAACCAGAGCAGGGACGAGAAGCTGAACCCCAACAAGTG WWENFEEETSKFNGTGGGAGAACTTCGAGGAGGAGACCAGCAAGTTCAACTACCTG YLFISSKFISGFKKNTTCATCAGCAGCAAGTTCATCAGCGGCTTCAAGAAGAACCTGCA LQYIADRTGVNGGGTACATCGCCGACAGGACCGGCGTGAACGGCGGCGCCATCAAC AINVENLLCFAEMLGTGGAGAACCTGCTGTGCTTCGCCGAGATGCTGAAGAGCGGCA KSGKLEYNDFFNQYAGCTGGAGTACAACGACTTCTTCAACCAGTACAACAACGACGA NNDEIIM GATCATCATG 38LPVKSQVSILKDYL 119 CTGCCCGTGAAGAGCCAGGTGAGCATCCTGAAGGACTACCTGARSCLSHVDHKYLIL GGAGCTGCCTGAGCCACGTGGACCACAAGTACCTGATCCTGCTGLDLGFDGTSDRDYE GACCTGGGCTTCGACGGCACCAGCGACAGGGACTACGAGATCCIQTAQLLTAELSFLG AGACCGCCCAGCTGCTGACCGCCGAGCTGAGCTTCCTGGGCGGCGRLGDTRKPDVCIY AGGCTGGGCGACACCAGGAAGCCCGACGTGTGCATCTACTACGYEDNGLIIDNKAYG AGGACAACGGCCTGATCATCGACAACAAGGCCTACGGCAAGGGKGYSLPIKQADEMY CTACAGCCTGCCCATCAAGCAGGCCGACGAGATGTACAGGTACRYIEENKERSELLNP ATCGAGGAGAACAAGGAGAGGAGCGAGCTGCTGAACCCCAACTNCWWNIFDEGVKT GCTGGTGGAACATCTTCGACGAGGGCGTGAAGACCTTCAGGTTCFRFAFLSGEFTGGF GCCTTCCTGAGCGGCGAGTTCACCGGCGGCTTCAAGGACAGGCTKDRLNHISMRSGIK GAACCACATCAGCATGAGGAGCGGCATCAAGGGCGCCGCCGTGGAAVNSANLLIIAE AACAGCGCCAACCTGCTGATCATCGCCGAGCAGCTGAAGAGCGQLKSGTMSYEEFFQ GCACCATGAGCTACGAGGAGTTCTTCCAGCTGTTCGACCAGAAC LFDQNDEITVGACGAGATCACCGTG 39 MSSKSEISVIKDNIR 120ATGAGCAGCAAGAGCGAGATCAGCGTGATCAAGGACAACATCA KRLNHINHKYLVLIGGAAGAGGCTGAACCACATCAACCACAAGTACCTGGTGCTGAT DLGFDGTADRDYECGACCTGGGCTTCGACGGCACCGCCGACAGGGACTACGAGCTG LQTADLLTSELSFKCAGACCGCCGACCTGCTGACCAGCGAGCTGAGCTTCAAGGGCG GARLGDTRKPDVCCCAGGCTGGGCGACACCAGGAAGCCCGACGTGTGCGTGTACCA VYHGTNGLIIDNKACGGCACCAACGGCCTGATCATCGACAACAAGGCCTACGGCAAG YGKGYSLPIKQADEGGCTACAGCCTGCCCATCAAGCAGGCCGACGAGATGCTGAGGT MLRYIEENQKRDKSACATCGAGGAGAACCAGAAGAGGGACAAGAGCCTGAACCCCAA LNPNEWWTIFDDACGAGTGGTGGACCATCTTCGACGACGCCGTGAGCAAGTTCAACT VSKFNFAFVSGEFTTCGCCTTCGTGAGCGGCGAGTTCACCGGCGGCTTCAAGGACAGG GGFKDRLENISRRSSCTGGAGAACATCAGCAGGAGGAGCAGCGTGAACGGCGCCGCCA VNGAAINSVNLLLLTCAACAGCGTGAACCTGCTGCTGCTGGCCGAGGAGATCAAGAG AEEIKSGRMSYSDACGGCAGGATGAGCTACAGCGACGCCTTCAAGAACTTCGACTGCA FKNFDCNKEITIACAAGGAGATCACCATC 40 RNLDKVERDSRKA 121AGGAACCTGGACAAGGTGGAGAGGGACAGCAGGAAGGCCGAG EFLAKTSLPPRFIELTTCCTGGCCAAGACCAGCCTGCCCCCCAGGTTCATCGAGCTGCT LSIAYESKSNRDFEGAGCATCGCCTACGAGAGCAAGAGCAACAGGGACTTCGAGATG MITAEFFKDVYGLGATCACCGCCGAGTTCTTCAAGGACGTGTACGGCCTGGGCGCCGT AVHLGNARKPDALGCACCTGGGCAACGCCAGGAAGCCCGACGCCCTGGCCTTCACCG AFTDNFGIVIDTKAACAACTTCGGCATCGTGATCGACACCAAGGCCTACAGCAACGGC YSNGYSKNINQEDETACAGCAAGAACATCAACCAGGAGGACGAGATGGTGAGGTACA MVRYIEDNQIRSPETCGAGGACAACCAGATCAGGAGCCCCGAGAGGAACAAGAACGA RNKNEWWLSFPPSIGTGGTGGCTGAGCTTCCCCCCCAGCATCCCCGAGAACAACTTCC PENNFHFLWVSSYFACTTCCTGTGGGTGAGCAGCTACTTCACCGGCTACTTCGAGGAG TGYFEEQLQETSDRCAGCTGCAGGAGACCAGCGACAGGGCCGGCGGCATGACCGGCG AGGMTGGALDIEQGCGCCCTGGACATCGAGCAGCTGCTGATCGGCGGCAGCCTGGTG LLIGGSLVQEGKLACAGGAGGGCAAGCTGGCCCCCCACGACATCCCCGAGTACATGC PHDIPEYMQNRVIHAGAACAGGGTGATCCACTTC F 41 APVKSEVSLCKDIL 122GCCCCCGTGAAGAGCGAGGTGAGCCTGTGCAAGGACATCCTGA RSHLTHVDHKYLILGGAGCCACCTGACCCACGTGGACCACAAGTACCTGATCCTGCTG LDLGFDGTSDRDYEGACCTGGGCTTCGACGGCACCAGCGACAGGGACTACGAGATCC IQTAQLLTAELDFKAGACCGCCCAGCTGCTGACCGCCGAGCTGGACTTCAAGGGCGCC GARLGDTRKPDVCAGGCTGGGCGACACCAGGAAGCCCGACGTGTGCGTGTACTACG VYYGEDGLILDNKAGCGAGGACGGCCTGATCCTGGACAACAAGGCCTACGGCAAGGG YGKGYSLPIKQADECTACAGCCTGCCCATCAAGCAGGCCGACGAGATGTACAGGTAC MYRYIEENKERNERATCGAGGAGAACAAGGAGAGGAACGAGAGGCTGAACCCCAAC LNPNKWWEIFDKDAAGTGGTGGGAGATCTTCGACAAGGACGTGGTGAGGTACCACTT VVRYHFAFVSGTFTCGCCTTCGTGAGCGGCACCTTCACCGGCGGCTTCAAGGAGAGGC GGFKERLDNIRMRSTGGACAACATCAGGATGAGGAGCGGCATCTGCGGCGCCGCCGT GICGAAVNSMNLLLGAACAGCATGAACCTGCTGCTGATGGCCGAGGAGCTGAAGAGC MAEELKSGRLGYKGGCAGGCTGGGCTACAAGGAGTGCTTCGCCCTGTTCGACTGCAA ECFALFDCNDEIAFCGACGAGATCGCCTTC 42 SCVKDEVNDIVDRV 123AGCTGCGTGAAGGACGAGGTGAACGACATCGTGGACAGGGTGA RVKLKNIDHKYLILIGGGTGAAGCTGAAGAACATCGACCACAAGTACCTGATCCTGATC SLAYSDETERTKKNAGCCTGGCCTACAGCGACGAGACCGAGAGGACCAAGAAGAACA SDARDFEIQTAELFTGCGACGCCAGGGACTTCGAGATCCAGACCGCCGAGCTGTTCACC KELGFNGIRLGESNAAGGAGCTGGGCTTCAACGGCATCAGGCTGGGCGAGAGCAACA KPDVLISFGANGTIIAGCCCGACGTGCTGATCAGCTTCGGCGCCAACGGCACCATCATC DNKSYKDGFNIPRVGACAACAAGAGCTACAAGGACGGCTTCAACATCCCCAGGGTGA TSDQMIRYINENNQCCAGCGACCAGATGATCAGGTACATCAACGAGAACAACCAGAG RTTQLNPNEWWKNGACCACCCAGCTGAACCCCAACGAGTGGTGGAAGAACTTCGAC FDSSVSNYTFLFVTSAGCAGCGTGAGCAACTACACCTTCCTGTTCGTGACCAGCTTCCT FLKGSFKNQIEYISNGAAGGGCAGCTTCAAGAACCAGATCGAGTACATCAGCAACGCC ATNGTRGAAINVESACCAACGGCACCAGGGGCGCCGCCATCAACGTGGAGAGCCTGC LLYISEDIKSGKIKQTGTACATCAGCGAGGACATCAAGAGCGGCAAGATCAAGCAGAG SDFYSEFKNDEIVYCGACTTCTACAGCGAGTTCAAGAACGACGAGATCGTGTAC 43 SQGDKAREQLKAK 124AGCCAGGGCGACAAGGCCAGGGAGCAGCTGAAGGCCAAGTTCC FLAKTNLLPRYVELTGGCCAAGACCAACCTGCTGCCCAGGTACGTGGAGCTGCTGGAC LDIAYDSKRNRDFEATCGCCTACGACAGCAAGAGGAACAGGGACTTCGAGATGGTGA MVTAELFNFAYLLPCCGCCGAGCTGTTCAACTTCGCCTACCTGCTGCCCGCCGTGCAC AVHLGGVRKPDALCTGGGCGGCGTGAGGAAGCCCGACGCCCTGGTGGCCACCAAGA VATKKFGIIVDTKAAGTTCGGCATCATCGTGGACACCAAGGCCTACGCCAACGGCTAC YANGYSRNANQADAGCAGGAACGCCAACCAGGCCGACGAGATGGCCAGGTACATCA EMARYITENQKRDPCCGAGAACCAGAAGAGGGACCCCAAGACCAACCCCAACAGGTG KTNPNRWWDNFDAGTGGGACAACTTCGACGCCAGGATCCCCCCCAACGCCTACTACT RIPPNAYYFLWVSSTCCTGTGGGTGAGCAGCTTCTTCACCGGCCAGTTCGACGACCAG FFTGQFDDQLSYTACTGAGCTACACCGCCCACAGGACCAACACCCACGGCGGCGCCCT HRTNTHGGALNVEGAACGTGGAGCAGCTGCTGATCGGCGCCAACATGATCCAGACC QLLIGANMIQTGQLGGCCAGCTGGACAGGAACAAGCTGCCCGAGTACATGCAGGACA DRNKLPEYMQDKEIAGGAGATCACCTTC TF 44 KVQKSNILDVIEKC 125AAGGTGCAGAAGAGCAACATCCTGGACGTGATCGAGAAGTGCA REKINNIPHEYLALIGGGAGAAGATCAACAACATCCCCCACGAGTACCTGGCCCTGATC PMSFDENESTMFEICCCATGAGCTTCGACGAGAACGAGAGCACCATGTTCGAGATCA KTIELLTEHCKFDGAGACCATCGAGCTGCTGACCGAGCACTGCAAGTTCGACGGCCTG LHCGGASKPDGLIYCACTGCGGCGGCGCCAGCAAGCCCGACGGCCTGATCTACAGCG SEDYGVIIDTKSYKAGGACTACGGCGTGATCATCGACACCAAGAGCTACAAGGACGG DGFNIQTPERDKMKCTTCAACATCCAGACCCCCGAGAGGGACAAGATGAAGAGGTAC RYIEENQNRNPQHNATCGAGGAGAACCAGAACAGGAACCCCCAGCACAACAAGACCA KTRWWDEFPHNISNGGTGGTGGGACGAGTTCCCCCACAACATCAGCAACTTCCTGTTC FLFLFVSGKFGGNFCTGTTCGTGAGCGGCAAGTTCGGCGGCAACTTCAAGGAGCAGCT KEQLRILSEQTNNTGAGGATCCTGAGCGAGCAGACCAACAACACCCTGGGCGGCGCC LGGALSSYVLLNIACTGAGCAGCTACGTGCTGCTGAACATCGCCGAGCAGATCGCCAT EQIAINKIDHCDFKTCAACAAGATCGACCACTGCGACTTCAAGACCAGGATCAGCTGCC RISCLDEVA TGGACGAGGTGGCC 45VPVKSEVSLCKDYL 126 GTGCCCGTGAAGAGCGAGGTGAGCCTGTGCAAGGACTACCTGARSYLTHVDHKYLIL GGAGCTACCTGACCCACGTGGACCACAAGTACCTGATCCTGCTGLDLGFDGTSDRDYE GACCTGGGCTTCGACGGCACCAGCGACAGGGACTACGAGATCCIQTAQLLTAELDFK AGACCGCCCAGCTGCTGACCGCCGAGCTGGACTTCAAGGGCGCCGARLGDTRKPDVC AGGCTGGGCGACACCAGGAAGCCCGACGTGTGCGTGTACTACG VYYGEDGLIIDNKAGCGAGGACGGCCTGATCATCGACAACAAGGCCTACGGCAAGGG YGKGYSLPIKQADECTACAGCCTGCCCATCAAGCAGGCCGACGAGATCTACAGGTACA IYRYIEENKKRDEKTCGAGGAGAACAAGAAGAGGGACGAGAAGCTGAACCCCAACA LNPNKWWEIFDKGAGTGGTGGGAGATCTTCGACAAGGGCGTGGTGAGGTACCACTTC VVRYHFAFVSGAFTGCCTTCGTGAGCGGCGCCTTCACCGGCGGCTTCAAGGAGAGGCT GGFKERLDNIRMRSGGACAACATCAGGATGAGGAGCGGCATCTGCGGCGCCGCCATC GICGAAINSMNLLLAACAGCATGAACCTGCTGCTGATGGCCGAGGAGCTGAAGAGCG MAEELKSGRLGYEEGCAGGCTGGGCTACGAGGAGTGCTTCGCCCTGTTCGACTGCAAC CFALFDCNDEITFGACGAGATCACCTTC 46 VPVKSEVSLCKDYL 127GTGCCCGTGAAGAGCGAGGTGAGCCTGTGCAAGGACTACCTGA RSHLNHVDHRYLILGGAGCCACCTGAACCACGTGGACCACAGGTACCTGATCCTGCTG LDLGFDGTSDRDYEGACCTGGGCTTCGACGGCACCAGCGACAGGGACTACGAGATCC IQTAQLLTGELNFKAGACCGCCCAGCTGCTGACCGGCGAGCTGAACTTCAAGGGCGC GARLGDTRKPDVCCAGGCTGGGCGACACCAGGAAGCCCGACGTGTGCGTGTACTAC VYYGEDGLIIDNKAGGCGAGGACGGCCTGATCATCGACAACAAGGCCTACGGCAAGG YGKGYSLPIKQADEGCTACAGCCTGCCCATCAAGCAGGCCGACGAGATGTACAGGTA MYRYIEENKERNEKCATCGAGGAGAACAAGGAGAGGAACGAGAAGCTGAACCCCAAC LNPNKWWEIFDKDAAGTGGTGGGAGATCTTCGACAAGGACGTGATCCACTACCACTT VIHYHFAFVSGAFTCGCCTTCGTGAGCGGCGCCTTCACCGGCGGCTTCAAGGAGAGGC GGFKERLENIRMRSTGGAGAACATCAGGATGAGGAGCGGCATCTACGGCGCCGCCGT GIYGAAVNSMNLLLGAACAGCATGAACCTGCTGCTGATGGCCGAGGAGCTGAAGAGC MAEELKSGRLDYKGGCAGGCTGGACTACAAGGAGTGCTTCAAGCTGTTCGACTGCAA ECFKLFDCNDEIVLCGACGAGATCGTGCTG 47 VPVKSEVSLLKDYL 128GTGCCCGTGAAGAGCGAGGTGAGCCTGCTGAAGGACTACCTGA RSHLVHVDHKYLVGGAGCCACCTGGTGCACGTGGACCACAAGTACCTGGTGCTGCTG LLDLGFDGTSDRDYGACCTGGGCTTCGACGGCACCAGCGACAGGGACTACGAGATCC EIQTAQLLTGELNFAGACCGCCCAGCTGCTGACCGGCGAGCTGAACTTCAAGGGCGC KGARLGDTRKPDVCAGGCTGGGCGACACCAGGAAGCCCGACGTGTGCGTGTACTAC CVYYGEDGLIIDNKGGCGAGGACGGCCTGATCATCGACAACAAGGCCTACGGCAAGG AYGKGYSLPIKQADGCTACAGCCTGCCCATCAAGCAGGCCGACGAGATGTACAGGTA EMYRYIEENKERNECATCGAGGAGAACAAGGAGAGGAACGAGAAGCTGAACCCCAAC KLNPNKWWEIFGNAAGTGGTGGGAGATCTTCGGCAACGACGTGATCCACTACCACTT DVIHYHFAFVSGAFCGCCTTCGTGAGCGGCGCCTTCACCGGCGGCTTCAAGGAGAGGC TGGFKERLDNIRMRTGGACAACATCAGGATGAGGAGCGGCATCTACGGCGCCGCCGT SGIYGAAVNSMNLLGAACAGCATGAACCTGCTGCTGCTGGCCGAGGAGCTGAAGAGC LLAEELKSGRLGYKGGCAGGCTGGGCTACAAGGAGTGCTTCAAGCTGTTCGACTGCAA ECFKLFDCNDEIVLCGACGAGATCGTGCTG 48 ECVKDNVVDIKDR 129GAGTGCGTGAAGGACAACGTGGTGGACATCAAGGACAGGGTGA VRNKLIHLDHKYLAGGAACAAGCTGATCCACCTGGACCACAAGTACCTGGCCCTGATC LIDLAYSDAASRAKGACCTGGCCTACAGCGACGCCGCCAGCAGGGCCAAGAAGAACG KNADAREFEIQTADCCGACGCCAGGGAGTTCGAGATCCAGACCGCCGACCTGTTCACC LFTKELSFNGQRLGAAGGAGCTGAGCTTCAACGGCCAGAGGCTGGGCGACAGCAGGA DSRKPDVIISYGLDGAGCCCGACGTGATCATCAGCTACGGCCTGGACGGCACCATCGTG TIVDNKSYKDGFNIGACAACAAGAGCTACAAGGACGGCTTCAACATCAGCAGGACCT SRTCADEMSRYINEGCGCCGACGAGATGAGCAGGTACATCAACGAGAACAACCTGAG NNLRQKSLNPNEWGCAGAAGAGCCTGAACCCCAACGAGTGGTGGAAGAACTTCGAC WKNFDSTITAYTFLAGCACCATCACCGCCTACACCTTCCTGTTCATCACCAGCTACCTG FITSYLKGQFEDQLEAAGGGCCAGTTCGAGGACCAGCTGGAGTACGTGAGCAACGCCA YVSNANGGIKGAAIACGGCGGCATCAAGGGCGCCGCCATCGGCGTGGAGAGCCTGCT GVESLLYLSEGIKAGTACCTGAGCGAGGGCATCAAGGCCGGCAGGATCAGCCACGCC GRISHADFYSNFNNGACTTCTACAGCAACTTCAACAACAAGGAGATGATCTAC KEMIY 49 IAKSDFSIIKDNIRRK 130ATCGCCAAGAGCGACTTCAGCATCATCAAGGACAACATCAGGA LQYVNHKYLLLIDLGGAAGCTGCAGTACGTGAACCACAAGTACCTGCTGCTGATCGAC GFDSDSNRDYEIQTCTGGGCTTCGACAGCGACAGCAACAGGGACTACGAGATCCAGA AELLTTELAFKGARCCGCCGAGCTGCTGACCACCGAGCTGGCCTTCAAGGGCGCCAGG LGDTRKPDVCVYYCTGGGCGACACCAGGAAGCCCGACGTGTGCGTGTACTACGGCG GENGLIIDNKAYSKAGAACGGCCTGATCATCGACAACAAGGCCTACAGCAAGGGCTA GYSLPMSQADEMVCAGCCTGCCCATGAGCCAGGCCGACGAGATGGTGAGGTACATC RYIEENKARQSSINPGAGGAGAACAAGGCCAGGCAGAGCAGCATCAACCCCAACCAGT NQWWKIFEDTVCNGGTGGAAGATCTTCGAGGACACCGTGTGCAACTTCAACTACGCC FNYAFVSGEFTGGFTTCGTGAGCGGCGAGTTCACCGGCGGCTTCAAGGACAGGCTGAA KDRLNNICERTRVSCAACATCTGCGAGAGGACCAGGGTGAGCGGCGGCGCCATCAAC GGAINTINLLLLAEEACCATCAACCTGCTGCTGCTGGCCGAGGAGCTGAAGAGCGGCA LKSGRMSYPKCFSYGGATGAGCTACCCCAAGTGCTTCAGCTACTTCGACACCAACGAC FDTNDEVHI GAGGTGCACATC 50LKYLGIKKQNRAFE 131 CTGAAGTACCTGGGCATCAAGAAGCAGAACAGGGCCTTCGAGAIITAELFNTSYKLSA TCATCACCGCCGAGCTGTTCAACACCAGCTACAAGCTGAGCGCCTHLGGGRRPDVLV ACCCACCTGGGCGGCGGCAGGAGGCCCGACGTGCTGGTGTACA YNDNFGIIVDTKAYACGACAACTTCGGCATCATCGTGGACACCAAGGCCTACAAGGA KDGYGRNVNQEDECGGCTACGGCAGGAACGTGAACCAGGAGGACGAGATGGTGAGG MVRYITENNIRKQDTACATCACCGAGAACAACATCAGGAAGCAGGACATCAACAAGA INKNDWWKYFSKSIACGACTGGTGGAAGTACTTCAGCAAGAGCATCCCCAGCACCAG PSTSYYHLWISSQFCTACTACCACCTGTGGATCAGCAGCCAGTTCGTGGGCATGTTCA VGMFSDQLRETSSRGCGACCAGCTGAGGGAGACCAGCAGCAGGACCGGCGAGAACGG TGENGGAMNVEQLCGGCGCCATGAACGTGGAGCAGCTGCTGATCGGCGCCAACCAG LIGANQVLNNVLDPGTGCTGAACAACGTGCTGGACCCCAACTGCCTGCCCAAGTACAT NCLPKYMENKEIIFGGAGAACAAGGAGATCATCTTC 51 VPVKSEVSLCKDYL 132GTGCCCGTGAAGAGCGAGGTGAGCCTGTGCAAGGACTACCTGA RSHLNHVDHKYLILGGAGCCACCTGAACCACGTGGACCACAAGTACCTGATCCTGCTG LDLGFDGTSDRDYEGACCTGGGCTTCGACGGCACCAGCGACAGGGACTACGAGATCC IQTAQLLTGELNFKAGACCGCCCAGCTGCTGACCGGCGAGCTGAACTTCAAGGGCGC GARLGDTRKPDVCCAGGCTGGGCGACACCAGGAAGCCCGACGTGTGCGTGTACTAC VYYGEDGLIIDNKAGGCGAGGACGGCCTGATCATCGACAACAAGGCCTACGGCAAGG YGKGYSLPIKQADEGCTACAGCCTGCCCATCAAGCAGGCCGACGAGATGTACAGGTA MYRYIEENKERNEKCATCGAGGAGAACAAGGAGAGGAACGAGAAGCTGAACCCCAAC LNPNKWWEIFDKDAAGTGGTGGGAGATCTTCGACAAGGACGTGATCCACTACCACTT VIHYHFAFVSGAFTCGCCTTCGTGAGCGGCGCCTTCACCGGCGGCTTCAGGGAGAGGC GGFRERLENIRMRSTGGAGAACATCAGGATGAGGAGCGGCATCTACGGCGCCGCCGT GIYGAAVNSMNLLLGAACAGCATGAACCTGCTGCTGATGGCCGAGGAGCTGAAGAGC MAEELKSGRLGYKGGCAGGCTGGGCTACAAGGAGTGCTTCAAGCTGTTCGACTGCAA ECFKLFDCNDEIVLCGACGAGATCGTGCTG 52 VPVKSEVSLLKDYL 133GTGCCCGTGAAGAGCGAGGTGAGCCTGCTGAAGGACTACCTGA RTHLLHVDHRYLILGGACCCACCTGCTGCACGTGGACCACAGGTACCTGATCCTGCTG LDLGFDGTSDRDYEGACCTGGGCTTCGACGGCACCAGCGACAGGGACTACGAGATCC IQTAQLLTGELNFKAGACCGCCCAGCTGCTGACCGGCGAGCTGAACTTCAAGGGCGC GARLGDTRKPDVCCAGGCTGGGCGACACCAGGAAGCCCGACGTGTGCGTGTACTAC VYYGEDGLIIDNKAGGCGAGGACGGCCTGATCATCGACAACAAGGCCTACGGCAAGG YGKGYSLPIKQADEGCTACAGCCTGCCCATCAAGCAGGCCGACGAGATGTACAGGTA MYRYIEENKERNEKCATCGAGGAGAACAAGGAGAGGAACGAGAAGCTGAACCCCAAC LNPNKWWEIFDNDAAGTGGTGGGAGATCTTCGACAACGACGTGATCCACTACCACTT VIHYHFAFISGAFTGCGCCTTCATCAGCGGCGCCTTCACCGGCGGCTTCAAGGAGAGGC GFKERLDNIRMRSGTGGACAACATCAGGATGAGGAGCGGCATCTACGGCGCCGCCGT IYGAAVNSMNLLLGAACAGCATGAACCTGCTGCTGATGGCCGAGGAGCTGAAGAGC MAEELKSGRLGYKGGCAGGCTGGGCTACAAGGAGTGCTTCAAGCTGTTCGACTGCAA ECFKLFDCNDEIVLCGACGAGATCGTGCTG 53 VPVKSEVSLCKDYL 134GTGCCCGTGAAGAGCGAGGTGAGCCTGTGCAAGGACTACCTGA RSHLNHVDHKYLILGGAGCCACCTGAACCACGTGGACCACAAGTACCTGATCCTGCTG LDLGFDGTSDRDYEGACCTGGGCTTCGACGGCACCAGCGACAGGGACTACGAGATCC IQTAQLLTGELNFKAGACCGCCCAGCTGCTGACCGGCGAGCTGAACTTCAAGGGCGC GARLGDTRKPDVCCAGGCTGGGCGACACCAGGAAGCCCGACGTGTGCGTGTACTAC VYYGEDGLIIDNKAGGCGAGGACGGCCTGATCATCGACAACAAGGCCTACGGCAAGG YGKGYSLPIKQADEGCTACAGCCTGCCCATCAAGCAGGCCGACGAGATGTACAGGTA MYRYIEENKERNEKCATCGAGGAGAACAAGGAGAGGAACGAGAAGCTGAACCCCAAC LNPNKWWEIFDNDAAGTGGTGGGAGATCTTCGACAACGACGTGATCCACTACCACTT VIHYHFAFVSGAFTCGCCTTCGTGAGCGGCGCCTTCACCGGCGGCTTCAGGGAGAGGC GGFRERLENIRMRSTGGAGAACATCAGGATGAGGAGCGGCATCTACGGCGCCGCCGT GIYGAAVNSMNLLLGAACAGCATGAACCTGCTGCTGATGGCCGAGGAGCTGAAGAGC MAEELKSGRLGYKGGCAGGCTGGGCTACAAGGAGTGCTTCAAGCTGTTCGACTGCAA ECFKLFDCNDEIVLCGACGAGATCGTGCTG 54 VPVKSEMSLLKDYL 135GTGCCCGTGAAGAGCGAGATGAGCCTGCTGAAGGACTACCTGA RTHLLHVDHRYLILGGACCCACCTGCTGCACGTGGACCACAGGTACCTGATCCTGCTG LDLGFDGASDRDYEGACCTGGGCTTCGACGGCGCCAGCGACAGGGACTACGAGATCC IQTAQLLTGELNFKAGACCGCCCAGCTGCTGACCGGCGAGCTGAACTTCAAGGGCGC GARLGDTRKPDVCCAGGCTGGGCGACACCAGGAAGCCCGACGTGTGCGTGTACTAC VYYGEDGLIIDNKAGGCGAGGACGGCCTGATCATCGACAACAAGGCCTACGGCAAGG YGKGYSLPIKQADEGCTACAGCCTGCCCATCAAGCAGGCCGACGAGATGTACAGGTA MYRYIEENKERNEKCATCGAGGAGAACAAGGAGAGGAACGAGAAGCTGAACCCCAAC LNPNKWWEIFDNDAAGTGGTGGGAGATCTTCGACAACGACGTGATCCACTACCACTT VIHYHFAFVSGAFTCGCCTTCGTGAGCGGCGCCTTCACCGGCGGCTTCAAGGAGAGGC GGFKERLDNIRMRSTGGACAACATCAGGATGAGGAGCGGCATCTACGGCGCCGCCGT GIYGAAVNSMNLLLGAACAGCATGAACCTGCTGCTGATGGCCGAGGAGCTGAAGAGC MAEELKSGRLGYKGGCAGGCTGGGCTACAAGGAGTGCTTCAAGCTGTTCGACTGCAA ECFKLFDCNDEIVLCGACGAGATCGTGCTG 55 ILVDKEREMRKAKF 136ATCCTGGTGGACAAGGAGAGGGAGATGAGGAAGGCCAAGTTCC LKETVLDSKFISLLDTGAAGGAGACCGTGCTGGACAGCAAGTTCATCAGCCTGCTGGAC LAADATKSRDFEIVCTGGCCGCCGACGCCACCAAGAGCAGGGACTTCGAGATCGTGA TAELFKEAYNLNS VCCGCCGAGCTGTTCAAGGAGGCCTACAACCTGAACAGCGTGCTG LLGGSNKPDGLVFTCTGGGCGGCAGCAACAAGCCCGACGGCCTGGTGTTCACCGACG DDFGILLDTKAYKNACTTCGGCATCCTGCTGGACACCAAGGCCTACAAGAACGGCTTC GFSIYAKDRDQMIRAGCATCTACGCCAAGGACAGGGACCAGATGATCAGGTACGTGG YVDDNNKRDKIRNACGACAACAACAAGAGGGACAAGATCAGGAACCCCAACGAGTG PNEWWKSFSPLIPNGTGGAAGAGCTTCAGCCCCCTGATCCCCAACGACAAGTTCTACT DKFYYLWVSNFFKACCTGTGGGTGAGCAACTTCTTCAAGGGCCAGTTCAAGAACCAG GQFKNQIEYVNRETATCGAGTACGTGAACAGGGAGACCAACACCTACGGCGCCGTGC NTYGAVLNVEQLLTGAACGTGGAGCAGCTGCTGTACGGCGCCGACGCCGTGATCAA YGADAVIKGIINPNGGGCATCATCAACCCCAACAAGCTGCACGAGTACTTCAGCAACG KLHEYFSNDEIKFACGAGATCAAGTTC 56 TVDEKERLELKEYF 137ACCGTGGACGAGAAGGAGAGGCTGGAGCTGAAGGAGTACTTCA ISNTRIPSKYITLLDLTCAGCAACACCAGGATCCCCAGCAAGTACATCACCCTGCTGGAC AYDGNANRDFEIVTCTGGCCTACGACGGCAACGCCAACAGGGACTTCGAGATCGTGA AELFKDIFKLQSKHCCGCCGAGCTGTTCAAGGACATCTTCAAGCTGCAGAGCAAGCAC MGGTRKPDILIWTDATGGGCGGCACCAGGAAGCCCGACATCCTGATCTGGACCGACA KFGVIADTKAYSKGAGTTCGGCGTGATCGCCGACACCAAGGCCTACAGCAAGGGCTA YKKNISEADKMVRCAAGAAGAACATCAGCGAGGCCGACAAGATGGTGAGGTACGTG YVNENTNRNKVDNAACGAGAACACCAACAGGAACAAGGTGGACAACACCAACGAGT TNEWWNSFDSRIPKGGTGGAACAGCTTCGACAGCAGGATCCCCAAGGACGCCTACTA DAYYFLWISSEFVGCTTCCTGTGGATCAGCAGCGAGTTCGTGGGCAAGTTCGACGAGC KFDEQLTETSSRTGAGCTGACCGAGACCAGCAGCAGGACCGGCAGGAACGGCGCCAG RNGASINVYQLLRGCATCAACGTGTACCAGCTGCTGAGGGGCGCCGACCTGGTGCAGA ADLVQKSKFNIHDLAGAGCAAGTTCAACATCCACGACCTGCCCAACCTGATGCAGAAC PNLMQNNEIKF AACGAGATCAAGTTC57 TLQKSDIEKFKNQL 138 ACCCTGCAGAAGAGCGACATCGAGAAGTTCAAGAACCAGCTGARTELTNIDHSYLKGI GGACCGAGCTGACCAACATCGACCACAGCTACCTGAAGGGCATDIASKKTTTNVENT CGACATCGCCAGCAAGAAGACCACCACCAACGTGGAGAACACCEFEAISTKVFTDELG GAGTTCGAGGCCATCAGCACCAAGGTGTTCACCGACGAGCTGGFFGEHLGGSNKPDG GCTTCTTCGGCGAGCACCTGGGCGGCAGCAACAAGCCCGACGGLIWDNDCAIILDSK CCTGATCTGGGACAACGACTGCGCCATCATCCTGGACAGCAAGGAYSEGFPLTASHTD CCTACAGCGAGGGCTTCCCCCTGACCGCCAGCCACACCGACGCCAMGRYLRQFKERK ATGGGCAGGTACCTGAGGCAGTTCAAGGAGAGGAAGGAGGAGA EEIKPTWWDIAPDNTCAAGCCCACCTGGTGGGACATCGCCCCCGACAACCTGGCCAAC LANTYFAYVSGSFSACCTACTTCGCCTACGTGAGCGGCAGCTTCAGCGGCAACTACAA GNYKAQLQKFRQDGGCCCAGCTGCAGAAGTTCAGGCAGGACACCAACCACATGGGC TNHMGGALEFVKLGGCGCCCTGGAGTTCGTGAAGCTGCTGCTGCTGGCCAACAACTA LLLANNYKAHKMSICAAGGCCCACAAGATGAGCATCAACGAGGTGAAGGAGAGCATC NEVKESILDYNISYCTGGACTACAACATCAGCTAC 58 VKEKTDAALVKER 139GTGAAGGAGAAGACCGACGCCGCCCTGGTGAAGGAGAGGGTGA VRLQLHNINHKYLAGGCTGCAGCTGCACAACATCAACCACAAGTACCTGGCCCTGATC LIDYAFSGKNNS RDGACTACGCCTTCAGCGGCAAGAACAACAGCAGGGACTTCGAGG FEVYTIDLLVNELTFTGTACACCATCGACCTGCTGGTGAACGAGCTGACCTTCGGCGGC GGLHLGGTRKPDGICTGCACCTGGGCGGCACCAGGAAGCCCGACGGCATCTTCTACCA FYHGSNGIIIDNKAYCGGCAGCAACGGCATCATCATCGACAACAAGGCCTACGCCAAG AKGFVITRNMADEGGCTTCGTGATCACCAGGAACATGGCCGACGAGATGATCAGGT MIRYVQENNDRNPEACGTGCAGGAGAACAACGACAGGAACCCCGAGAGGAACCCCAA RNPNCWWKGFPHDCTGCTGGTGGAAGGGCTTCCCCCACGACGTGACCAGGTACAACT VTRYNYVFISSMFKACGTGTTCATCAGCAGCATGTTCAAGGGCGAGGTGGAGCACATG GEVEHMLDNIRQSTCTGGACAACATCAGGCAGAGCACCGGCATCGACGGCTGCGTGC GIDGCVLTIENLLYTGACCATCGAGAACCTGCTGTACTACGCCGACGCCATCAAGGGC YADAIKGGTLSKATGGCACCCTGAGCAAGGCCACCTTCATCAACGGCTTCAACGCCAA FINGFNANKEMVFCAAGGAGATGGTGTTC 59 VKETTDSVIIKDRV 140GTGAAGGAGACCACCGACAGCGTGATCATCAAGGACAGGGTGA RLKLHHVNHKYLTGGCTGAAGCTGCACCACGTGAACCACAAGTACCTGACCCTGATC LIDYAFSGKNNCMDGACTACGCCTTCAGCGGCAAGAACAACTGCATGGACTTCGAGGT FEVYTIDLLVNELAGTACACCATCGACCTGCTGGTGAACGAGCTGGCCTTCAACGGCG FNGVHLGGTRKPDTGCACCTGGGCGGCACCAGGAAGCCCGACGGCATCTTCTACCAC GIFYHNRNGIIIDNKAACAGGAACGGCATCATCATCGACAACAAGGCCTACAGCCACG AYSHGFTLSRAMAGCTTCACCCTGAGCAGGGCCATGGCCGACGAGATGATCAGGTAC DEMIRYIQENNDRNATCCAGGAGAACAACGACAGGAACCCCGAGAGGAACCCCAACA PERNPNKWWENFDAGTGGTGGGAGAACTTCGACAAGGGCGTGAACCAGTTCAACTTC KGVNQFNFVFISSLFGTGTTCATCAGCAGCCTGTTCAAGGGCGAGATCGAGCACATGCT KGEIEHMLTNIKQSGACCAACATCAAGCAGAGCACCGACGGCGTGGAGGGCTGCGTG TDGVEGCVLSAENLCTGAGCGCCGAGAACCTGCTGTACTTCGCCGAGGCCATGAAGAG LYFAEAMKSGVMPCGGCGTGATGCCCAAGACCGAGTTCATCAGCTACTTCGGCGCCG KTEFISYFGAGKEIQGCAAGGAGATCCAGTTC F 60 SACKADITELKDKI 141AGCGCCTGCAAGGCCGACATCACCGAGCTGAAGGACAAGATCA RKSLKVLDHKYLVGGAAGAGCCTGAAGGTGCTGGACCACAAGTACCTGGTGCTGGT LVDLAYSDASTKSKGGACCTGGCCTACAGCGACGCCAGCACCAAGAGCAAGAAGAAC KNSDAREFEIQTADAGCGACGCCAGGGAGTTCGAGATCCAGACCGCCGACCTGTTCAC LFTKELKFDGMRLGCAAGGAGCTGAAGTTCGACGGCATGAGGCTGGGCGACAGCAAC DSNRPDVIISHDNFGAGGCCCGACGTGATCATCAGCCACGACAACTTCGGCACCATCAT TIIDNKSYKDGFNIDCGACAACAAGAGCTACAAGGACGGCTTCAACATCGACAAGAAG KKCADEMSRYINENTGCGCCGACGAGATGAGCAGGTACATCAACGAGAACCAGAGGA QRRIPELPKNEWWKGGATCCCCGAGCTGCCCAAGAACGAGTGGTGGAAGAACTTCGA NFDVNVDIFTFLFITCGTGAACGTGGACATCTTCACCTTCCTGTTCATCACCAGCTACCT SYLKGNFKDQLEYIGAAGGGCAACTTCAAGGACCAGCTGGAGTACATCAGCAAGAGC SKSQSDIKGAAISVECAGAGCGACATCAAGGGCGCCGCCATCAGCGTGGAGCACCTGC HLLYISEKVKNGSMTGTACATCAGCGAGAAGGTGAAGAACGGCAGCATGGACAAGGC DKADFFKLFNNDEICGACTTCTTCAAGCTGTTCAACAACGACGAGATCAGGGTG RV 61 VLKDKHLEKIKEKF 142GTGCTGAAGGACAAGCACCTGGAGAAGATCAAGGAGAAGTTCC LENTSLDPRFISLIEITGGAGAACACCAGCCTGGACCCCAGGTTCATCAGCCTGATCGAG SRDKKQNRAFEIITAATCAGCAGGGACAAGAAGCAGAACAGGGCCTTCGAGATCATCA ELFNTSYNLSAIHLGCCGCCGAGCTGTTCAACACCAGCTACAACCTGAGCGCCATCCAC GGRRPDVLAYNDNCTGGGCGGCGGCAGGAGGCCCGACGTGCTGGCCTACAACGACA FGIIVDTKAYKNGYACTTCGGCATCATCGTGGACACCAAGGCCTACAAGAACGGCTAC GRNVNQEDEMVRYGGCAGGAACGTGAACCAGGAGGACGAGATGGTGAGGTACATCA ITENKIRKQDISKNNCCGAGAACAAGATCAGGAAGCAGGACATCAGCAAGAACAACTG WWKYFSKSIPSTSYGTGGAAGTACTTCAGCAAGAGCATCCCCAGCACCAGCTACTACC YHLWISSEFVGMFSACCTGTGGATCAGCAGCGAGTTCGTGGGCATGTTCAGCGACCAG DQLRETSSRTGENGCTGAGGGAGACCAGCAGCAGGACCGGCGAGAACGGCGGCGCCA GAMNVEQLLIGANTGAACGTGGAGCAGCTGCTGATCGGCGCCAACCAGGTGCTGAA QVLNNVLDPNRLPECAACGTGCTGGACCCCAACAGGCTGCCCGAGTACATGGAGAAC YMENKEIIF AAGGAGATCATCTTC 62ALKDKHLEKIKEKF 143 GCCCTGAAGGACAAGCACCTGGAGAAGATCAAGGAGAAGTTCCLENTSLDPRFISLIEI TGGAGAACACCAGCCTGGACCCCAGGTTCATCAGCCTGATCGAGSRDKKQNRAFEIITA ATCAGCAGGGACAAGAAGCAGAACAGGGCCTTCGAGATCATCAELFNTSYKLSATHL CCGCCGAGCTGTTCAACACCAGCTACAAGCTGAGCGCCACCCACGGGRRPDVLVYND CTGGGCGGCGGCAGGAGGCCCGACGTGCTGGTGTACAACGACA NFGIIVDTKAYKDGACTTCGGCATCATCGTGGACACCAAGGCCTACAAGGACGGCTAC YGRNVNQEDEMVRGGCAGGAACGTGAACCAGGAGGACGAGATGGTGAGGTACATCA YITENNIRKQDINKNCCGAGAACAACATCAGGAAGCAGGACATCAACAAGAACGACTG DWWKYFSKSIPSTSGTGGAAGTACTTCAGCAAGAGCATCCCCAGCACCAGCTACTACC YYHLWISSQFVGMFACCTGTGGATCAGCAGCCAGTTCGTGGGCATGTTCAGCGACCAG SDQLRETSSRTGENCTGAGGGAGACCAGCAGCAGGACCGGCGAGAACGGCGGCGCCA GGAMNVEQLLIGATGAACGTGGAGCAGCTGCTGATCGGCGCCAACCAGGTGCTGAA NQVLNNVLDPNCLPCAACGTGCTGGACCCCAACTGCCTGCCCAAGTACATGGAGAACA KYMENKEIIF AGGAGATCATCTTC63 VLEKSDIEKFKNQL 144 GTGCTGGAGAAGAGCGACATCGAGAAGTTCAAGAACCAGCTGARTELTNIDHSYLKGI GGACCGAGCTGACCAACATCGACCACAGCTACCTGAAGGGCATDIASKKKTSNVENT CGACATCGCCAGCAAGAAGAAGACCAGCAACGTGGAGAACACCEFEAISTKIFTDELG GAGTTCGAGGCCATCAGCACCAAGATCTTCACCGACGAGCTGGGFSGKHLGGSNKPDG CTTCAGCGGCAAGCACCTGGGCGGCAGCAACAAGCCCGACGGCLLWDDDCAIILDSK CTGCTGTGGGACGACGACTGCGCCATCATCCTGGACAGCAAGGCAYSEGFPLTASHTD CTACAGCGAGGGCTTCCCCCTGACCGCCAGCCACACCGACGCCAAMGRYLRQFTERK TGGGCAGGTACCTGAGGCAGTTCACCGAGAGGAAGGAGGAGAT EEIKPTWWDIAPEHCAAGCCCACCTGGTGGGACATCGCCCCCGAGCACCTGGACAAC LDNTYFAYVSGSFSACCTACTTCGCCTACGTGAGCGGCAGCTTCAGCGGCAACTACAA GNYKEQLQKFRQDGGAGCAGCTGCAGAAGTTCAGGCAGGACACCAACCACCTGGGC TNHLGGALEFVKLLGGCGCCCTGGAGTTCGTGAAGCTGCTGCTGCTGGCCAACAACTA LLANNYKTQKMSKCAAGACCCAGAAGATGAGCAAGAAGGAGGTGAAGAAGAGCAT KEVKKSILDYNISYCCTGGACTACAACATCAGCTAC 64 AEADVTSEKIKNHF 145GCCGAGGCCGACGTGACCAGCGAGAAGATCAAGAACCACTTCA RRVTELPERYLELLGGAGGGTGACCGAGCTGCCCGAGAGGTACCTGGAGCTGCTGGA DIAFDHKRNRDFEMCATCGCCTTCGACCACAAGAGGAACAGGGACTTCGAGATGGTG VTAGLFKDVYGLESACCGCCGGCCTGTTCAAGGACGTGTACGGCCTGGAGAGCGTGCA VHLGGANKPDGVVCCTGGGCGGCGCCAACAAGCCCGACGGCGTGGTGTACAACGAC YNDNFGIILDTKAYAACTTCGGCATCATCCTGGACACCAAGGCCTACGAGAACGGCTA ENGYGKHISQIDEMCGGCAAGCACATCAGCCAGATCGACGAGATGGTGAGGTACATC VRYIDDNRLRDTTRGACGACAACAGGCTGAGGGACACCACCAGGAACCCCAACAAGT NPNKWWENFDADIGGTGGGAGAACTTCGACGCCGACATCCCCAGCGACCAGTTCTAC PSDQFYYLWVSGKFTACCTGTGGGTGAGCGGCAAGTTCCTGCCCAACTTCGCCGAGCA LPNFAEQLKQTNYRGCTGAAGCAGACCAACTACAGGAGCCACGCCAACGGCGGCGGC SHANGGGLEVQQLCTGGAGGTGCAGCAGCTGCTGCTGGGCGCCGACGCCGTGAAGA LLGADAVKRRKLDGGAGGAAGCTGGACGTGAACACCATCCCCAACTACATGAAGAA VNTIPNYMKNEVITCGAGGTGATCACCCTG L 65 AEADLNSEKIKNHY 146GCCGAGGCCGACCTGAACAGCGAGAAGATCAAGAACCACTACA RKITNLPEKYIELLDGGAAGATCACCAACCTGCCCGAGAAGTACATCGAGCTGCTGGA IAFDHRRHQDFEIVTCATCGCCTTCGACCACAGGAGGCACCAGGACTTCGAGATCGTGA AGLFKDCYGLSSIHCCGCCGGCCTGTTCAAGGACTGCTACGGCCTGAGCAGCATCCAC LGGQNKPDGVVFNCTGGGCGGCCAGAACAAGCCCGACGGCGTGGTGTTCAACAACA NKFGIILDTKAYEKAGTTCGGCATCATCCTGGACACCAAGGCCTACGAGAAGGGCTAC GYGMHIGQIDEMCGGCATGCACATCGGCCAGATCGACGAGATGTGCAGGTACATCG RYIDDNKKRDIVRQACGACAACAAGAAGAGGGACATCGTGAGGCAGCCCAACGAGTG PNEWWKNFGDNIPGTGGAAGAACTTCGGCGACAACATCCCCAAGGACCAGTTCTACT KDQFYYLWISGKFLACCTGTGGATCAGCGGCAAGTTCCTGCCCAGGTTCAACGAGCAG PRFNEQLKQTHYRTCTGAAGCAGACCCACTACAGGACCAGCATCAACGGCGGCGGCC SINGGGLEVSQLLLTGGAGGTGAGCCAGCTGCTGCTGGGCGCCAACGCCGCCATGAA GANAAMKGKLDVGGGCAAGCTGGACGTGAACACCCTGCCCAAGCACATGAACAAC NTLPKHMNNQVIKLCAGGTGATCAAGCTG 66 VLKDAALQKTKNT 147GTGCTGAAGGACGCCGCCCTGCAGAAGACCAAGAACACCCTGC LLNELTEIDPADIEVTGAACGAGCTGACCGAGATCGACCCCGCCGACATCGAGGTGAT IEMSWKKATTRSQNCGAGATGAGCTGGAAGAAGGCCACCACCAGGAGCCAGAACACC TLEATLFEVKVVEIFCTGGAGGCCACCCTGTTCGAGGTGAAGGTGGTGGAGATCTTCAA KKYFELNGEHLGGGAAGTACTTCGAGCTGAACGGCGAGCACCTGGGCGGCCAGAAC QNRPDGAVYYNSTAGGCCCGACGGCGCCGTGTACTACAACAGCACCTACGGCATCAT YGIILDTKAYSNGYCCTGGACACCAAGGCCTACAGCAACGGCTACAACATCCCCGTGG NIPVDQQREMVDYIACCAGCAGAGGGAGATGGTGGACTACATCACCGACGTGATCGA TDVIDKNQNVTPNRCAAGAACCAGAACGTGACCCCCAACAGGTGGTGGGAGGCCTTC WWEAFPATLLKNNICCCGCCACCCTGCTGAAGAACAACATCTACTACCTGTGGGTGGC YYLWVAGGFTGKYCGGCGGCTTCACCGGCAAGTACCTGGACCAGCTGACCAGGACCC LDQLTRTHNQTNMACAACCAGACCAACATGGACGGCGGCGCCATGACCACCGAGGT DGGAMTTEVLLRLGCTGCTGAGGCTGGCCAACAAGGTGAGCAGCGGCAACCTGAAG ANKVSSGNLKTTDIACCACCGACATCCCCAAGCTGATGACCAACAAGCTGATCCTGAG PKLMTNKLILS C 67AEADLDSERIKNHY 148 GCCGAGGCCGACCTGGACAGCGAGAGGATCAAGAACCACTACARKITNLPEKYIELLD GGAAGATCACCAACCTGCCCGAGAAGTACATCGAGCTGCTGGAIAFDHHRHQDFEIIT CATCGCCTTCGACCACCACAGGCACCAGGACTTCGAGATCATCAAGLFKDCYGLSSIH CCGCCGGCCTGTTCAAGGACTGCTACGGCCTGAGCAGCATCCACLGGQNKPDGVVFN CTGGGCGGCCAGAACAAGCCCGACGGCGTGGTGTTCAACGGCA GKFGIILDTKAYEKAGTTCGGCATCATCCTGGACACCAAGGCCTACGAGAAGGGCTAC GYGMHINQIDEMCGGCATGCACATCAACCAGATCGACGAGATGTGCAGGTACATCG RYIEDNKQRDKIRQAGGACAACAAGCAGAGGGACAAGATCAGGCAGCCCAACGAGTG PNEWWNNFGDNIPGTGGAACAACTTCGGCGACAACATCCCCGAGAACAAGTTCTACT ENKFYYLWVSGKFACCTGTGGGTGAGCGGCAAGTTCCTGCCCAAGTTCAACGAGCAG LPKFNEQLKQTHYRCTGAAGCAGACCCACTACAGGACCGGCATCAACGGCGGCGGCC TGINGGGLEVSQLLTGGAGGTGAGCCAGCTGCTGCTGGGCGCCGACGCCGTGATGAA LGADAVMKGALNVGGGCGCCCTGAACGTGAACATCCTGCCCACCTACATGCACAACA NILPTYMHNNVIQ ACGTGATCCAG68 EISDIALQKEKAYFY 149 GAGATCAGCGACATCGCCCTGCAGAAGGAGAAGGCCTACTTCTKNTALSKRHISILEI ACAAGAACACCGCCCTGAGCAAGAGGCACATCAGCATCCTGGAAFDGSKNRDLEILS GATCGCCTTCGACGGCAGCAAGAACAGGGACCTGGAGATCCTGAEVFKDYYQLESIH AGCGCCGAGGTGTTCAAGGACTACTACCAGCTGGAGAGCATCCLGGGLKPDGIAFNQ ACCTGGGCGGCGGCCTGAAGCCCGACGGCATCGCCTTCAACCAGNFGIIVDTKAYKGV AACTTCGGCATCATCGTGGACACCAAGGCCTACAAGGGCGTGTAYSRSRAEADKMFR CAGCAGGAGCAGGGCCGAGGCCGACAAGATGTTCAGGTACATC YIEDNKKRDPKRNQGAGGACAACAAGAAGAGGGACCCCAAGAGGAACCAGAGCCTGT SLWWRSFNEHIPANGGTGGAGGAGCTTCAACGAGCACATCCCCGCCAACAACTTCTAC NFYFLWISGKFQRNTTCCTGTGGATCAGCGGCAAGTTCCAGAGGAACTTCGACACCCA FDTQINQLNYETGYGATCAACCAGCTGAACTACGAGACCGGCTACAGGGGCGGCGCC RGGALSARQFLIGACTGAGCGCCAGGCAGTTCCTGATCGGCGCCGACGCCATCCAGAA DAIQKGKIDINDLPSGGGCAAGATCGACATCAACGACCTGCCCAGCTACTTCAACAACA YFNNSVISF GCGTGATCAGCTTC 69TSREKSRLNLKEYF 150 ACCAGCAGGGAGAAGAGCAGGCTGAACCTGAAGGAGTACTTCGVSNTNLPNKFITLLD TGAGCAACACCAACCTGCCCAACAAGTTCATCACCCTGCTGGACLAYDGKANRDFELI CTGGCCTACGACGGCAAGGCCAACAGGGACTTCGAGCTGATCATSELFREIYKLNTRH CCAGCGAGCTGTTCAGGGAGATCTACAAGCTGAACACCAGGCALGGTRKPDILIWNE CCTGGGCGGCACCAGGAAGCCCGACATCCTGATCTGGAACGAGNFGIIADTKAYSKG AACTTCGGCATCATCGCCGACACCAAGGCCTACAGCAAGGGCTAYKKNISEEDKMVR CAAGAAGAACATCAGCGAGGAGGACAAGATGGTGAGGTACATC YIDENIKRSKDYNPGACGAGAACATCAAGAGGAGCAAGGACTACAACCCCAACGAGT NEWWKVFDNEISSGGTGGAAGGTGTTCGACAACGAGATCAGCAGCAACAACTACTT NNYFYLWISSEFIGKCTACCTGTGGATCAGCAGCGAGTTCATCGGCAAGTTCGAGGAGC FEEQLQETAQRTNVAGCTGCAGGAGACCGCCCAGAGGACCAACGTGAAGGGCGCCAG KGASINVYQLLMGCATCAACGTGTACCAGCTGCTGATGGGCGCCCACAAGGTGCAGA AHKVQTKELNVNSICCAAGGAGCTGAACGTGAACAGCATCCCCAAGTACATGAACAA PKYMNNTEIKF CACCGAGATCAAGTTC70 NCIKDSIIDIKDRVR 151 AACTGCATCAAGGACAGCATCATCGACATCAAGGACAGGGTGATKLVHLDHKYLALI GGACCAAGCTGGTGCACCTGGACCACAAGTACCTGGCCCTGATCDLAFSDADTRTKKN GACCTGGCCTTCAGCGACGCCGACACCAGGACCAAGAAGAACASDAREFEIQTADLFT GCGACGCCAGGGAGTTCGAGATCCAGACCGCCGACCTGTTCACCKELSFNGQRLGDSR AAGGAGCTGAGCTTCAACGGCCAGAGGCTGGGCGACAGCAGGAKPDIIISFDKIGTIIDN AGCCCGACATCATCATCAGCTTCGACAAGATCGGCACCATCATCKSYKDGFNISRPCA GACAACAAGAGCTACAAGGACGGCTTCAACATCAGCAGGCCCTDEMIRYINENNLRK GCGCCGACGAGATGATCAGGTACATCAACGAGAACAACCTGAG KSLNANEWWNKFDGAAGAAGAGCCTGAACGCCAACGAGTGGTGGAACAAGTTCGAC PTITAYSFLFITSYLKCCCACCATCACCGCCTACAGCTTCCTGTTCATCACCAGCTACCTG GQFQEQLEYISNANAAGGGCCAGTTCCAGGAGCAGCTGGAGTACATCAGCAACGCCA GGIKGAAIGIENLLYACGGCGGCATCAAGGGCGCCGCCATCGGCATCGAGAACCTGCT LSEALKSGKISHKDGTACCTGAGCGAGGCCCTGAAGAGCGGCAAGATCAGCCACAAG FYQNFNNKEITYGACTTCTACCAGAACTTCAACAACAAGGAGATCACCTAC 71 LPQKDQVQQQQDE 152CTGCCCCAGAAGGACCAGGTGCAGCAGCAGCAGGACGAGCTGA LRPMLKNVDHRYLGGCCCATGCTGAAGAACGTGGACCACAGGTACCTGCAGCTGGT QLVELALDSDQNSEGGAGCTGGCCCTGGACAGCGACCAGAACAGCGAGTACAGCCAG YSQFEQLTMELVLKTTCGAGCAGCTGACCATGGAGCTGGTGCTGAAGCACCTGGACTT HLDFDGKPLGGSNKCGACGGCAAGCCCCTGGGCGGCAGCAACAAGCCCGACGGCATC PDGIAWDNDGNFIIFGCCTGGGACAACGACGGCAACTTCATCATCTTCGACACCAAGGC DTKAYNKGYSLAGCTACAACAAGGGCTACAGCCTGGCCGGCAACACCGACAAGGTG NTDKVKRYIDDVRAAGAGGTACATCGACGACGTGAGGGACAGGGACACCAGCAGGA DRDTSRTSTWWQLCCAGCACCTGGTGGCAGCTGGTGCCCAAGAGCATCGACGTGCAC VPKSIDVHNLLRFVAACCTGCTGAGGTTCGTGTACGTGAGCGGCAACTTCACCGGCAA YVSGNFTGNYMKLCTACATGAAGCTGCTGGACAGCCTGAGGAGCTGGAGCAACGCC LDSLRSWSNAQGGCAGGGCGGCCTGGCCAGCGTGGAGAAGCTGCTGCTGACCAGCG LASVEKLLLTSELYAGCTGTACCTGAGGAACATGTACAGCCACCAGGAGCTGATCGA LRNMYSHQELIDSWCAGCTGGACCGACAACAACGTGAAGCAC TDNNVKH 72 TTDAVVVKDRARV 153ACCACCGACGCCGTGGTGGTGAAGGACAGGGCCAGGGTGAGGC RLHNINHKYLTLIDTGCACAACATCAACCACAAGTACCTGACCCTGATCGACTACGCC YAFSGKNNCTEFEITTCAGCGGCAAGAACAACTGCACCGAGTTCGAGATCTACACCAT YTIDLLVNELAFNGICGACCTGCTGGTGAACGAGCTGGCCTTCAACGGCATCCACCTGG HLGGTRKPDGIFDYGCGGCACCAGGAAGCCCGACGGCATCTTCGACTACAACCAGCA NQQGIIIDNKAYSKGGGCATCATCATCGACAACAAGGCCTACAGCAAGGGCTTCACC GFTITRSMADEMVRATCACCAGGAGCATGGCCGACGAGATGGTGAGGTACGTGCAGG YVQENNDRNPERNAGAACAACGACAGGAACCCCGAGAGGAACAAGACCCAGTGGTG KTQWWLNFGDNVGCTGAACTTCGGCGACAACGTGAACCACTTCAACTTCGTGTTCA NHFNFVFISSMFKGTCAGCAGCATGTTCAAGGGCGAGGTGAGGCACATGCTGAACAA EVRHMLNNIKQSTGCATCAAGCAGAGCACCGGCGTGGACGGCTGCGTGCTGACCGCC VDGCVLTAENLLYFGAGAACCTGCTGTACTTCGCCGACGCCATCAAGGGCGGCACCGT ADAIKGGTVKRTDFGAAGAGGACCGACTTCATCAACCTGTTCGGCAAGAACGACGAG INLFGKNDEL CTG 73LPKKDNVQRQQDE 154 CTGCCCAAGAAGGACAACGTGCAGAGGCAGCAGGACGAGCTGALRPLLKHVDHRYLQ GGCCCCTGCTGAAGCACGTGGACCACAGGTACCTGCAGCTGGTGLVELALDSSQNSEY GAGCTGGCCCTGGACAGCAGCCAGAACAGCGAGTACAGCATGCSMLESMTMELLLTH TGGAGAGCATGACCATGGAGCTGCTGCTGACCCACCTGGACTTCLDFDGASLGGASKP GACGGCGCCAGCCTGGGCGGCGCCAGCAAGCCCGACGGCATCGDGIAWDKDGNFLIV CCTGGGACAAGGACGGCAACTTCCTGATCGTGGACACCAAGGCDTKAYDNGYS LAG CTACGACAACGGCTACAGCCTGGCCGGCAACACCGACAAGGTG NTDKVARYIDDVRGCCAGGTACATCGACGACGTGAGGGCCAAGGACCCCAACAGGG AKDPNRASTWWTQCCAGCACCTGGTGGACCCAGGTGCCCGAGAGCCTGAACGTGGA VPESLNVDDNLSFMCGACAACCTGAGCTTCATGTACGTGAGCGGCAGCTTCACCGGCA YVSGSFTGNYQRLLACTACCAGAGGCTGCTGAAGGACCTGAGGGCCAGGACCAACGC KDLRARTNARGGLCAGGGGCGGCCTGACCACCGTGGAGAAGCTGCTGCTGACCAGC TTVEKLLLTSEAYLGAGGCCTACCTGGCCAAGAGCGGCTACGGCCACACCCAGCTGCT AKSGYGHTQLLNDGAACGACTGGACCGACGACAACATCGACCAC WTDDNIDH 74 QIKDKYLEDLKLEL 155CAGATCAAGGACAAGTACCTGGAGGACCTGAAGCTGGAGCTGT YKKTNLPNKYYEMACAAGAAGACCAACCTGCCCAACAAGTACTACGAGATGGTGGA VDIAYDGKRNREFECATCGCCTACGACGGCAAGAGGAACAGGGAGTTCGAGATCTAC IYTSDLMQEIYGFKACCAGCGACCTGATGCAGGAGATCTACGGCTTCAAGACCACCCT TTLLGGTRKPDVVSGCTGGGCGGCACCAGGAAGCCCGACGTGGTGAGCTACAGCGAC YSDAHGYIIDTKAYGCCCACGGCTACATCATCGACACCAAGGCCTACGCCAACGGCTA ANGYRKEIKQEDECAGGAAGGAGATCAAGCAGGAGGACGAGATGGTGAGGTACATC MVRYIEDNQLKDVGAGGACAACCAGCTGAAGGACGTGCTGAGGAACCCCAACAAGT LRNPNKWWECFDDGGTGGGAGTGCTTCGACGACGCCGAGCACAAGAAGGAGTACTA AEHKKEYYFLWISSCTTCCTGTGGATCAGCAGCAAGTTCGTGGGCGAGTTCAGCAGCC KFVGEFSSQLQDTSAGCTGCAGGACACCAGCAGGAGGACCGGCATCAAGGGCGGCGC RRTGIKGGAVNIVQCGTGAACATCGTGCAGCTGCTGCTGGGCGCCCACCTGGTGTACA LLLGAHLVYSGEISGCGGCGAGATCAGCAAGGACCAGTTCGCCGCCTACATGAACAA KDQFAAYMNNTEICACCGAGATCAACTTC NF 75 MNPRNEIVIAKHLS 156ATGAACCCCAGGAACGAGATCGTGATCGCCAAGCACCTGAGCG GGNRPEIVCYHPEDGCGGCAACAGGCCCGAGATCGTGTGCTACCACCCCGAGGACAA KPDHGLILDSKAYKGCCCGACCACGGCCTGATCCTGGACAGCAAGGCCTACAAGAGC SGFTIPSGERDKMVGGCTTCACCATCCCCAGCGGCGAGAGGGACAAGATGGTGAGGT RYIEEYITKNQLQNPACATCGAGGAGTACATCACCAAGAACCAGCTGCAGAACCCCAA NEWWKNLKGAEYPCGAGTGGTGGAAGAACCTGAAGGGCGCCGAGTACCCCGGCATC GIVGFGFISNSFLGHGTGGGCTTCGGCTTCATCAGCAACAGCTTCCTGGGCCACTACAG YRKQLDYIMRRTKIGAAGCAGCTGGACTACATCATGAGGAGGACCAAGATCAAGGGC KGSSITTEHLLKTVEAGCAGCATCACCACCGAGCACCTGCTGAAGACCGTGGAGGACG DVLSEKGNVIDFFKTGCTGAGCGAGAAGGGCAACGTGATCGACTTCTTCAAGTACTTC YFLE CTGGAG 76EIKNQEIEELKQIAL 157 GAGATCAAGAACCAGGAGATCGAGGAGCTGAAGCAGATCGCCCNKYTALPSEWVELI TGAACAAGTACACCGCCCTGCCCAGCGAGTGGGTGGAGCTGATCEISRDKDQSTIFEMK GAGATCAGCAGGGACAAGGACCAGAGCACCATCTTCGAGATGAVAELFKTCYRIKSL AGGTGGCCGAGCTGTTCAAGACCTGCTACAGGATCAAGAGCCTGHLGGASKPDCLLW CACCTGGGCGGCGCCAGCAAGCCCGACTGCCTGCTGTGGGACG DDSFSVIVDAKAYKACAGCTTCAGCGTGATCGTGGACGCCAAGGCCTACAAGGACGG DGFPFQASEKDKMCTTCCCCTTCCAGGCCAGCGAGAAGGACAAGATGGTGAGGTACC VRYLRECERKDKATGAGGGAGTGCGAGAGGAAGGACAAGGCCGAGAACGCCACCG ENATEWWNNFPPEAGTGGTGGAACAACTTCCCCCCCGAGCTGAACAGCAACCAGCTG LNSNQLFFMFASSFTTCTTCATGTTCGCCAGCAGCTTCTTCAGCAGCACCGCCGAGAA FSSTAEKHLESVSIAGCACCTGGAGAGCGTGAGCATCGCCAGCAAGTTCAGCGGCTGC SKFSGCAWDVDNLGCCTGGGACGTGGACAACCTGCTGAGCGGCGCCAACTTCTTCCT LSGANFFLQNPQATGCAGAACCCCCAGGCCACCCTGCAGTACCACCTGATCAGGGTGT LQYHLIRVFSNKVVTCAGCAACAAGGTGGTGGAC D 77 LPHKDNVIKQQDEL 158CTGCCCCACAAGGACAACGTGATCAAGCAGCAGGACGAGCTGA RPMLKHVNHKYLQGGCCCATGCTGAAGCACGTGAACCACAAGTACCTGCAGCTGGTG LVELAFESSRNSEYSGAGCTGGCCTTCGAGAGCAGCAGGAACAGCGAGTACAGCCAGT QFETLTMELVLKYLTCGAGACCCTGACCATGGAGCTGGTGCTGAAGTACCTGGACTTC DFSGKSLGGANKPDAGCGGCAAGAGCCTGGGCGGCGCCAACAAGCCCGACGGCATCG GIAWDPLGNFLIFDCCTGGGACCCCCTGGGCAACTTCCTGATCTTCGACACCAAGGCC TKAYKHGYTLSNNTACAAGCACGGCTACACCCTGAGCAACAACACCGACAGGGTGG TDRVARYINDVRDCCAGGTACATCAACGACGTGAGGGACAAGGACATCCAGAGGAT KDIQRISRWWQSIPTCAGCAGGTGGTGGCAGAGCATCCCCACCTACATCGACGTGAAG YIDVKNKLQFVYISAACAAGCTGCAGTTCGTGTACATCAGCGGCAGCTTCACCGGCCA GSFTGHYLRLLNDLCTACCTGAGGCTGCTGAACGACCTGAGGAGCAGGACCAGGGCC RSRTRAKGGLVTVEAAGGGCGGCCTGGTGACCGTGGAGAAGCTGCTGCTGACCACCG KLLLTTERYLAEADAGAGGTACCTGGCCGAGGCCGACTACACCCACAAGGAGCTGTT YTHKELFDDWMDDCGACGACTGGATGGACGACAACATCGAGCAC NIEH 78 RISPSNLEQTKQQLR 159AGGATCAGCCCCAGCAACCTGGAGCAGACCAAGCAGCAGCTGA EELINLDHQYLDILDGGGAGGAGCTGATCAACCTGGACCACCAGTACCTGGACATCCTG FSIAGNVGARQFEVGACTTCAGCATCGCCGGCAACGTGGGCGCCAGGCAGTTCGAGGT RIVELLNEIIIAKHLSGAGGATCGTGGAGCTGCTGAACGAGATCATCATCGCCAAGCAC GGNRPEIIGFNPKENCTGAGCGGCGGCAACAGGCCCGAGATCATCGGCTTCAACCCCA PEDCIIMDSKAYKEAGGAGAACCCCGAGGACTGCATCATCATGGACAGCAAGGCCTA GFNIPANERDKMIRCAAGGAGGGCTTCAACATCCCCGCCAACGAGAGGGACAAGATG YVEEYNAKDNTLNATCAGGTACGTGGAGGAGTACAACGCCAAGGACAACACCCTGA NNKWWKNFESPNYACAACAACAAGTGGTGGAAGAACTTCGAGAGCCCCAACTACCC PTNQVKFSFVSSSFICACCAACCAGGTGAAGTTCAGCTTCGTGAGCAGCAGCTTCATCG GQFTNQLTYINNRTGCCAGTTCACCAACCAGCTGACCTACATCAACAACAGGACCAAC NVNGSAITAETLLRGTGAACGGCAGCGCCATCACCGCCGAGACCCTGCTGAGGAAGG KVENVMNVNTEYNTGGAGAACGTGATGAACGTGAACACCGAGTACAACCTGAACAA LNNFFEELGSNTLVCTTCTTCGAGGAGCTGGGCAGCAACACCCTGGTGGCC A 79 TFDSTVADNLKNLI 160ACCTTCGACAGCACCGTGGCCGACAACCTGAAGAACCTGATCCT LPKLKELDHKYLQAGCCCAAGCTGAAGGAGCTGGACCACAAGTACCTGCAGGCCATC IDIAYKRSNTTNHEGACATCGCCTACAAGAGGAGCAACACCACCAACCACGAGAACA NTLLEVLSADLFTKCCCTGCTGGAGGTGCTGAGCGCCGACCTGTTCACCAAGGAGATG EMDYHGKHLGGANGACTACCACGGCAAGCACCTGGGCGGCGCCAACAAGCCCGACG KPDGFVYDEETGWIGCTTCGTGTACGACGAGGAGACCGGCTGGATCCTGGACAGCAA LDSKAYRDGFAVTGGCCTACAGGGACGGCTTCGCCGTGACCGCCCACACCACCGACG AHTTDAMGRYIDQCCATGGGCAGGTACATCGACCAGTACAGGGACAGGGACGACAA YRDRDDKSTWWEDGAGCACCTGGTGGGAGGACTTCCCCAAGGACCTGCCCCAGACCT FPKDLPQTYFAYVSACTTCGCCTACGTGAGCGGCTTCTACATCGGCAAGTACCAGGAG GFYIGKYQEQLQDFCAGCTGCAGGACTTCGAGAACAGGAAGCACATGAAGGGCGGCC ENRKHMKGGLIEVTGATCGAGGTGGCCAAGCTGATCCTGCTGGCCGAGAAGTACAA AKLILLAEKYKENKGGAGAACAAGATCACCCACGACCAGATCACCCTGCAGATCCTG ITHDQITLQILNDHISAACGACCACATCAGCCAG Q 80 PLDVVEQMKAELR 161CCCCTGGACGTGGTGGAGCAGATGAAGGCCGAGCTGAGGCCCC PLLNHVNHRLLAIIDTGCTGAACCACGTGAACCACAGGCTGCTGGCCATCATCGACTTC FSYNMSRGDDKRLAGCTACAACATGAGCAGGGGCGACGACAAGAGGCTGGAGGACT EDYTAQIYKLISHDACACCGCCCAGATCTACAAGCTGATCAGCCACGACACCCACCTG THLLAGPSRPDVVSCTGGCCGGCCCCAGCAGGCCCGACGTGGTGAGCGTGATCAACG VINDLGIIIDSKAYKACCTGGGCATCATCATCGACAGCAAGGCCTACAAGCAGGGCTTC QGFNIPQAEEDKMVAACATCCCCCAGGCCGAGGAGGACAAGATGGTGAGGTACCTGG RYLDESIRRDPAINPACGAGAGCATCAGGAGGGACCCCGCCATCAACCCCACCAAGTG TKWWEYLGASTEYGTGGGAGTACCTGGGCGCCAGCACCGAGTACGTGTTCCAGTTCG VFQFVSSSFSSGASATGAGCAGCAGCTTCAGCAGCGGCGCCAGCGCCAAGCTGAGGCA KLRQIHRRSSIEGSIIGATCCACAGGAGGAGCAGCATCGAGGGCAGCATCATCACCGCC TAKNLLLLAENFLCAAGAACCTGCTGCTGCTGGCCGAGAACTTCCTGTGCACCAACAC TNTINIDLFRQNNEICATCAACATCGACCTGTTCAGGCAGAACAACGAGATC 81 QLVPSYITQTKLRLS 162CAGCTGGTGCCCAGCTACATCACCCAGACCAAGCTGAGGCTGAG GLINYIDHSYFDLIDCGGCCTGATCAACTACATCGACCACAGCTACTTCGACCTGATCG LGFDGRQNRLYELRACCTGGGCTTCGACGGCAGGCAGAACAGGCTGTACGAGCTGAG IVELLNLINSLKALHGATCGTGGAGCTGCTGAACCTGATCAACAGCCTGAAGGCCCTGC LSGGNRPEIIAYSPDACCTGAGCGGCGGCAACAGGCCCGAGATCATCGCCTACAGCCC VNPINGVIMDSKSYCGACGTGAACCCCATCAACGGCGTGATCATGGACAGCAAGAGC RGGFNIPNSERDKMTACAGGGGCGGCTTCAACATCCCCAACAGCGAGAGGGACAAGA IRYINEYNQKNPTLTGATCAGGTACATCAACGAGTACAACCAGAAGAACCCCACCCT NSNRWWENFRAPDGAACAGCAACAGGTGGTGGGAGAACTTCAGGGCCCCCGACTAC YPQSPLKYSFVSGNCCCCAGAGCCCCCTGAAGTACAGCTTCGTGAGCGGCAACTTCAT FIGQFLNQIQYILTQCGGCCAGTTCCTGAACCAGATCCAGTACATCCTGACCCAGACCG TGINGGAITSEKLIEGCATCAACGGCGGCGCCATCACCAGCGAGAAGCTGATCGAGAA KVNAVLNPNISYTIGGTGAACGCCGTGCTGAACCCCAACATCAGCTACACCATCAACA NNFFNDLGCNRLVACTTCTTCAACGACCTGGGCTGCAACAGGCTGGTGCAG Q

In some embodiments, an endonuclease of the present disclosure can havea sequence ofX₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁X₁₂X₁₃X₁₄X₁₅X₁₆X₁₇X₁₈X₁₉X₂₀X₂₁X₂₂X₂₃X₂₄X₂₅X₂₆X₂₇X₂₈X₂₉X₃₀X₃₁X₃₂X₃₃X₃₄X₃₅X₃₆X₃₇X₃₈X₃₉X₄₀X₄₁X₄₂X₄₃X₄₄X₄₅X₄₆X₄₇X₄₈X₄₉X₅₀X₅₁X₅₂X₅₃X₅₄X₅₅GX₅₆HLGGX₅₇RX₅₈PDGX₅₉X₆₀X₆₁X₆₂X₆₃X₆₄X₆₅X₆₆X₆₇X₆₈X₆₉X₇₀X₇₁X₇₂X₇₃X₇₄GX₇₅₁X₇₆DTKX₇₇YX₇₈X₇₉GYX₈₀LPIX₈₁QX₈₂DEMX₈₃RYX₈₄X₈₅ENX₈₆X₈₇RX₈₈X₈₉X₉₀X₉₁NX₉₂NX₉₃WWX₉₄X₉₅X₉₆X₉₇X₉₈X₉₉X₁₀₀X₁₀₁X₁₀₂X₁₀₃X₁₀₄X₁₀₅X₁₀₆FX₁₀₇X₁₀₈X₁₀₉X₁₁₀FX₁₁₁GX₁₁₂X₁₁₃X₁₁₄X₁₁₅X₁₁₆X₁₁₇X₁₁₈RX₁₁₉X₁₂₀X₁₂₁X₁₂₂X₁₂₃X₁₂₄X₁₂₅X₁₂₆GX₁₂₇X₁₂₈X₁₂₉X₁₃₀X₁₃₁X₁₃₂X₁₃₃LLX₁₃₄X₁₃₅X₁₃₆X₁₃₇X₁₃₈X₁₃₉X₁₄₀X₁₄₁X₁₄₂X₁₄₃X₁₄₄X₁₄₅X₁₄₆X₁₄₇X₁₄₈X₁₄₉X₁₅₀X₁₅₁X₁₅₂X₁₅₃FX₁₅₄X₁₅₅X₁₅₆X₁₅₇X₁₅₈X₁₅₉X₁₆₀ (SEQ IDNO: 316), wherein X₁ is F, Q, N, D, or absent, X₂ is L, I, T, S, N, orabsent, X₃ is V, I, G, A, E, T, or absent, X₄ is K, C, or absent, X₅ isG, S, or absent, X₆ is A, S, E, D, N, or absent, X₇ is M, I, V, Q, F, L,or absent, X₈ is E, S, T, N, or absent, X₉ is I, M, E, T, Q, or absent,X₁₀ is K, S, L, I, T, E, or absent, X₁₁ is K or absent, X₁₂ is S, A, E,D, or absent, X₁₃ is E, N, Q, K, or absent, X₁₄ is L, M, V, or absent,X₁₅ is R or absent, X₁₆ is H, D, T, G, E, N, or absent, X₁₇ is K, N, Q,E, A, or absent, X₁₈ is L or absent, X₁₉ is R, Q, N, T, D, or absent,X₂₀ is H, M, V, N, T, or absent, X₂₁ is V, L, I, or absent, X₂₂ is P, S,or absent, X₂₃ is H or absent, X₂₄ is E, D, or absent, X₂₅ is Y orabsent, X₂₆ is I, L, or absent, X₂₇ is E, Q, G, S, A, Y, or absent, X₂₈is L or absent, X₂₉ is I, V, L, or absent, X₃₀ is E, D, or absent, X₃₁is I, L, or absent, X₃₂ is A, S, or absent, X₃₃ is Q, Y, F, or absent,X₃₄ is D or absent, X₃₅ is S, P, or absent, X₃₆ is K, Y, Q, T, orabsent, X₃₇ is Q or absent, X₃₈ is N or absent, X₃₉ is R, K, or absent,X₄₀ is L, I, or absent, X₄₁ is L, F, or absent, X₄₂ is E or absent, X₄₃is F, M, L, or absent, X₄₄ is V, T, or I, X₄₅ is V, M, L, or I, X₄₆ isE, D, or Q, X₄₇ is F or L, X₄₈ is F or L, X₄₉ is K, I, T, or V, X₅₀ isK, N, or E, X₅₁ is I or E, X₅₂ is Y, F, or C, X₅₃ is G, or N, X₅₄ is Y,or F, X₅₅ is R, S, N, E, K, or Q, X₅₆ is K, S, L, V, or T, X₅₇ is S, A,or V, X₅₈ is K or R, X₅₉ is A, I, or V, X₆₀ is L, M, V, I, or C, X₆₁ isF or Y, X₆₂ is T, A, or S, X₆₃ is K, E, or absent, X₆₄ is D, E, orabsent, X₆₅ is E, A, or absent, X₆₆ is N, K, or absent, X₆₇ is E, S, orabsent, X₆₈ is D, E, Q, A, or absent, X₆₉ is G, V, K, N, or absent, X₇₀is L, G, E, S, or absent, X₇₁ is V, S, K, T, E, or absent, X₇₂ is L, H,K, E, Y, D, or A, X₇₃ is N, G, or D, X₇₄ is H, F, or Y, X₇₅ is I, or V,X₇₆ is L, V, or I, X₇₇ is A or S, X₇₈ is K or S, X₇₉ is D, G, K, S, orN, X₈₀ is R, N, S, or G, X₈₁ is S, A, or G, X₈₂ is A, I, or V, X₈₃ is Q,E, I, or V, X₈₄ is V or I, X₈₅ is D, R, G, I, or E, X₈₆ is N, I, or Q,X₈₇ is K, D, T, E, or K, X₈₈ is S, N, D, or E, X₈₉ is Q, E, I, K, or A,X₉₀ is V, H, R, K, L, or E, X₉₁ is I, V, or R, X₉₂ is P, S, T, or R, X₉₃is E, R, C, Q, or K, X₉₄ is E, N, or K, X₉₅ is I, V, N, E, or A, X₉₆ isY or F, X₉₇ is P, G, or E, X₉₈ is T, E, S, D, K, or N, X₉₉ is S, D, K,G, N, or T, X₁₀₀ is I, T, V, or L, X₁₀₁ is T, N, G, or D, X₁₀₂ is D, E,T, K, or I, X₁₀₃ is F or Y, X₁₀₄ is K or Y, X₁₀₅ is F or Y, X₁₀₆ is L,S, or M, X₁₀₇ is V or I, X₁₀₈ is S or A, X₁₀₉ is G or A, X₁₁₀ is F, Y,H, E, or K, Xiii is Q, K, T, N, or I, X₁₁₂ is D, N, or K, X₁₁₃ is Y, F,I, or V, X₁₁₄ is R, E, K, Q, or F, X₁₁₅ is K, E, A, or N, X₁₁₆ is Q orK, X₁₁₇ is L or I, X₁₁₈ is E, D, N, or Q, X₁₁₉ is V, I, or L, X₁₂₀ is S,N, F, T, or Q, X₁₂₁ is H, I, C, or R, X₁₂₂ is L, D, N, S, or F, X₁₂₃ isT or K, X₁₂₄ is K, G, or N, X₁₂₅ is C, V, or I, X₁₂₆ is Q, L, K, or Y,X₁₂₇ is A, G, or N, X₁₂₈ is V or A, X₁₂₉ is M, L, I, V, or A, X₁₃₀ is S,T, or D, X₁₃₁ is V or I, X₁₃₂ is E, Q, K, S, or I, X₁₃₃ is Q, H, or T,X₁₃₄ is L, R, or Y, X₁₃₅ is G, I, L, or T, X₁₃₆ is G, A, or V, X₁₃₇ isE, N, or D, X₁₃₈ is K, Y, D, E, A, or R, X₁₃₉ is I, F, Y, or C, X₁₄₀ isK or R, X₁₄₁ is E, R, A, G, or T, X₁₄₂ is G or N, X₁₄₃ is S, I, K, R, orE, X₁₄₄ is L, I, or M, X₁₄₅ is T, S, D, or K, X₁₄₆ is L, H, Y, R, T, orF, X₁₄₇ is E, Y, I, M, A, or L, X₁₄₈ is E, D, R, or G, X₁₄₉ is V, F, M,L, or I, X₁₅₀ is G, K, R, L, V, or E, X₁₅₁ is K, N, D, L, H, or S, X₁₅₂is K, L, C, or absent, X₁₅₃ is K, S, I, Y, M, or F, X₁₅₄ is K, L, C, H,D, Q, or N, X₁₅₅ is N or Y, X₁₅₆ is D, K, T, E, C, or absent, X₁₅₇ is E,V, R, or absent, X₁₅₈ is I, F, L, or absent, X₁₅₉ is V, Q, E, L, orabsent, and X₁₆₀ is F or absent.

In some embodiments, an endonuclease of the present disclosure can havea sequence ofX₁X₂X₃X₄X₅X₆X₇X₈X₉X₁₀X₁₁X₁₂X₁₃X₁₄X₁₅X₁₆X₁₇X₁₈X₁₉X₂₀X₂₁X₂₂X₂₃X₂₄X₂₅X₂₆X₂₇X₂₈X₂₉X₃₀X₃₁X₃₂X₃₃X₃₄X₃₅X₃₆X₃₇X₃₈X₃₉X₄₀X₄₁X₄₂X₄₃X₄₄X₄₅X₄₆X₄₇X₄₈X₄₉X₅₀X₅₁X₅₂X₅₃X₅₄X₅₅GX₅₆HLGGX₅₇RX₅₈PDGX₅₉X₆₀X₆₁X₆₂X₆₃X₆₄X₆₅X₆₆X₆₇X₆₈X₆₉X₇₀X₇₁X₇₂X₇₃X₇₄GX₇₅X₇₆DTKX₇₇YX₇₈X₇₉GYX₈₀LPIX₈₁QX₈₂DEMX₈₃RYX₈₄X₈₅ENX₈₆X₈₇RX₈₈X₈₉X₉₀X₉₁NX₉₂NX₉₃WWX₉₄X₉₅X₉₆X₉₇X₉₈X₉₉X₁₀₀X₁₀₁X₁₀₂X₁₀₃X₁₀₄X₁₀₅X₁₀₆FX₁₀₇X₁₀₈X₁₀₉X₁₁₀FX₁₁₁GX₁₁₂X₁₁₃X₁₁₄X₁₁₅X₁₁₆X₁₁₇X₁₁₈RX₁₁₉X₁₂₀X₁₂₁X₁₂₂X₁₂₃X₁₂₄X₁₂₅X₁₂₆GX₁₂₇X₁₂₈X₁₂₉X₁₃₀X₁₃₁X₁₃₂X₁₃₃LLX₁₃₄X₁₃₅X₁₃₆X₁₃₇X₁₃₈X₁₃₉X₁₄₀X₁₄₁X₁₄₂X₁₄₃X₁₄₄X₁₄₅X₁₄₆X₁₄₇X₁₄₈X₁₄₉X₁₅₀X₁₅₁X₁₅₂X₁₅₃FX₁₅₄X₁₅₅X₁₅₆X₁₅₇X₁₅₈X₁₅₉X₁₆₀ (SEQ IDNO: 317), wherein X₁ is F, Q, N, or absent, X₂ is L, I, T, S, or absent,X₃ is V, I, G, A, E, T, or absent, X₄ is K, C, or absent, X₅ is G, S, orabsent, X₆ is A, S, E, D, or absent, X₇ is M, I, V, Q, F, L, or absent,X₈ is E, S, T, or absent, X₉ is I, M, E, T, Q, or absent, X₁₀ is K, S,L, I, T, E, or absent, X₁₁ is K or absent, X₁₂ is S, A, E, D, or absent,X₁₃ is E, N, Q, K, or absent, X₁₄ is L, M, V, or absent, X₁₅ is R orabsent, X₁₆ is H, D, T, G, E, N, or absent, X₁₇ is K, N, Q, E, A, orabsent, X₁₈ is L or absent, X₁₉ is R, Q, N, T, D, or absent, X₂₀ is H,M, V, N, T, or absent, X₂₁ is V, L, I, or absent, X₂₂ is P, S, orabsent, X₂₃ is H or absent, X₂₄ is E, D, or absent, X₂₅ is Y or absent,X₂₆ is I, L, or absent, X₂₇ is E, Q, G, S, A, or absent, X₂₈ is L orabsent, X₂₉ is I, V, L, or absent, X₃₀ is E, D, or absent, X₃₁ is I, L,or absent, X₃₂ is A, S, or absent, X₃₃ is Q, Y, F, or absent, X₃₄ is Dor absent, X₃₅ is S, P, or absent, X₃₆ is K, Y, Q, T, or absent, X₃₇ isQ or absent, X₃₈ is N or absent, X₃₉ is R or absent, X₄₀ is L, I, orabsent, X₄₁ is L, F, or absent, X₄₂ is E or absent, X₄₃ is F, M, L, orabsent, X₄₄ is V, T, or I, X₄₅ is V, M, L, or I, X₄₆ is E, D, or Q, X₄₇is F or L, X₄₈ is F or L, X₄₉ is K, I, T, or V, X₅₀ is K, N, or E, X₅₁is I or E, X₅₂ is Y, F, or C, X₅₃ is G, or N, X₅₄ is Y, or F, X₅₅ is R,S, N, E, K, or Q, X₅₆ is K, S, L, V, or T, X₅₇ is S or A, X₅₈ is K or R,X₅₉ is A, I, or V, X₆₀ is L, M, V, I, or C, X₆₁ is F or Y, X₆₂ is T, A,or S, X₆₃ is K, E, or absent, X₆₄ is D, E, or absent, X₆₅ is E, A, orabsent, X₆₆ is N, K, or absent, X₆₇ is E, S, or absent, X₆₈ is D, E, Q,A, or absent, X₆₉ is G, V, K, N, or absent, X₇O is L, G, E, S, orabsent, X₇₁ is V, S, K, T, E, or absent, X₇₂ is L, H, K, E, Y, D, or A,X₇₃ is N, G, or D, X₇₄ is H, F, or Y, X₇₅ is I, or V, X₇₆ is L, V, or I,X₇₇ is A or S, X₇₈ is K or S, X₇₉ is D, G, K, S, or N, X₈₀ is R, N, S,or G, X₈₁ is S, A, or G, X₈₂ is A, I, or V, X₈₃ is Q, E, I, or V, X₈₄ isV or I, X₈₅ is D, R, G, I, or E, X₈₆ is N, I, or Q, X₈₇ is K, D, T, E,or K, X₈₈ is S, N, D, or E, X₈₉ is Q, E, I, K, or A, X₉₀ is V, H, R, K,L, or E, X₉₁ is I, V, or R, X92 is P, S, T, or R, X93 is E, R, C, Q, orK, X94 is E, N, or K, X95 is I, V, N, E, or A, X96 is Y or F, X₉₇ is P,G, or E, X₉₈ is T, E, S, D, K, or N, X₉₉ is S, D, K, G, N, or T, X₁₀₀ isI, T, V, or L, X₁₀₁ is T, N, G, or D, X₁₀₂ is D, E, T, K, or I, X₁₀₃ isF or Y, X₁₀₄ is K or Y, X₁₀₅ is F or Y, X₁₀₆ is L, S, or M, X₁₀₇ is V orI, X₁₀₈ is S or A, X₁₀₉ is G or A, X110 is F, Y, H, E, or K, X₁₁₀ is Q,K, T, N, or I, X₁₁₂ is D, N, or K, X₁₁₃ is Y, F, I, or V, X₁₁₄ is R, E,K, Q, or F, X₁₁₅ is K, E, A, or N, X₁₁₆ is Q or K, X₁₁₇ is L or I, X₁₁₈is E, D, N, or Q, X₁₁₉ is V, I, or L, X₁₂₀ is S, N, F, T, or Q, X₁₂₁ isH, I, C, or R, X₁₂₂ is L, D, N, S, or F, X₁₂₃ is T or K, X₁₂₄ is K, G,or N, X₁₂₅ is C, V, or I, X₁₂₆ is Q, L, K, or Y, X₁₂₇ is A, G, or N,X₁₂₈ is V or A, X₁₂₉ is M, L, I, V, or A, X₁₃₀ is S, T, or D, X₁₃₁ is Vor I, X₁₃₂ is E, Q, K, S, or I, X₁₃₃ is Q, H, or T, X₁₃₄ is L, R, or Y,X₁₃₅ is G, I, L, or T, X₁₃₆ is G, A, or V, X₁₃₇ is E, N, or D, X₁₃₈ isK, Y, D, E, A, or R, X₁₃₉ is I, F, Y, or C, X₁₄₀ is K or R, X₁₄₁ is E,R, A, G, or T, X₁₄₂ is G or N, X₁₄₃ is S, I, K, R, or E, X₁₄₄ is L, I,or M, X₁₄₅ is T, S, D, or K, X₁₄₆ is L, H, Y, R, or T, X₁₄₇ is E, Y, I,M, or A, X₁₄₈ is E, D, R, or G, X₁₄₉ is V, F, M, L, or I, X₁₅₀ is G, K,R, L, V, or E, X₁₅₁ is K, N, D, L, H, or S, X₁₅₂ is K, L, C, or absent,X₁₅₃ is K, S, I, Y, M, or F, X₁₅₄ is K, L, C, H, D, Q, or N, X₁₅₅ is Nor Y, X₁₅₆ is D, K, T, E, C, or absent, X₁₅₇ is E, V, R, or absent, X₁₅₈is I, F, L, or absent, X₁₅₉ is V, Q, E, L, or absent, and X₁₆₀ is F orabsent.

In some embodiments, an endonuclease of the present disclosure can havea sequence ofX₁LVKSSX₂EEX₃KEELREKLX₄HLSHEYLX₅LX₆DLAYDSKQNRLFEMKVX₇ELLINECGYX₈GLHLGGSRKPDGIX₉YTEGLKX₁₀NYGIIIDTKAYSDGYNLPISQADEMERYIRENNTRNX₁₁X₁₂VNPNEWWENFPX₁₃NINEFYFLFVSGHFKGNX₁₄EEQLERISIX₁₅TX₁₆IKGAAMSVX₁₇TLLLLANEIKAGRLX₁₈LEEVX₁₉KYFDNKEIX₂₀F (SEQ ID NO: 318), wherein X₁ is F, Q, N,D, or absent, X₂ is M, I, V, Q, F, L, or absent, X₃ is K, S, L, I, T, E,or absent, X₄ is R, Q, N, T, D, or absent, X₅ is E, Q, G, S, A, Y, orabsent, X₆ is I, V, L, or absent, X₇ is V, M, L, or I, X₈ is R, S, N, E,K, or Q, X₉ is L, M, V, I, or C, X₁₀ is L, H, K, E, Y, D, or A, X₁₁ isQ, E, I, K, or A, X₁₂ is V, H, R, K, L, or E, X₁₃ is T, E, S, D, K, orN, X₁₄ is Y, F, I, or V, X₁₅ is L, D, N, S, or F, X₁₆ is K, G, or N, X₁₇is E, Q, K, S, or I, X₁₈ is T, S, D, or K, X₁₉ is G, K, R, L, V, or E,and X₂₀ is V, Q, E, L, or absent.

In some embodiments, an endonuclease of the present disclosure can havea sequence ofX₁LVKSSX₂EEX₃KEELREKLX₄HLSHEYLX₅LX₆DLAYDSKQNRLFEMKVX₇ELLINECGYX₈GLHLGGSRKPDGIX₉YTEGLKX₁₀NYGIIIDTKAYSDGYNLPISQADEMERYIRENNTRNX₁₁X₁₂VNPNEWWENFPX₁₃NINEFYFLFVSGHFKGNX₁₄EEQLERISIX₁₅TX₁₆IKGAAMSVX₁₇TLLLLANEIKAGRLX₁₈LEEVX₁₉KYFDNKEIX₂₀F (SEQ ID NO: 319), wherein X₁ is F, Q, N,or absent, X₂ is M, I, V, Q, F, L, or absent, X₃ is K, S, L, I, T, E, orabsent, X₄ is R, Q, N, T, D, or absent, X₅ is E, Q, G, S, A, or absent,X₆ is I, V, L, or absent, X₇ is V, M, L, or I, X₈ is R, S, N, E, K, orQ, X₉ is L, M, V, I, or C, X₁₀ is L, H, K, E, Y, D, or A, X₁₁ is Q, E,I, K, or A, X₁₂ is V, H, R, K, L, or E, X₁₃ is T, E, S, D, K, or N, X₁₄is Y, F, I, or V, X₁₅ is L, D, N, S, or F, X₁₆ is K, G, or N, X₁₇ is E,Q, K, S, or I, X₁₈ is T, S, D, or K, X₁₉ is G, K, R, L, V, or E, and X₂₀is V, Q, E, L, or absent. In some aspects, the cleavage domain comprisesa sequence selected from SEQ ID NO: 316-SEQ ID NO: 319.

In some embodiments, an endonuclease of the present disclosure can haveconserved amino acid residues at position 76 (D or E), position 98 (D),and position 100 (K), which together preserve catalytic function. Insome embodiments, an endonuclease of the present disclosure can haveconserved amino acid residues at position 114 (D) and position 118 (R),which together preserve dimerization of two cleavage domains.

In some embodiments, endonucleases disclosed herein (e.g., SEQ ID NO:1-SEQ ID NO: 81 (nucleic acid sequences of SEQ ID NO: 82-SEQ ID NO:162)) can have at least 33.3% divergence from SEQ ID NO: 163 (FokI) and,is immunologically orthogonal to SEQ ID NO: 163 (FokI). In someembodiments, an immunologically orthogonal endonuclease (e.g., SEQ IDNO: 1-SEQ ID NO: 81 (nucleic acid sequences of SEQ ID NO: 82-SEQ ID NO:162)) can be administered to a patient that has already received, and isthus can have an adverse immune reaction to, FokI. In some embodiments,endonucleases disclosed herein (e.g., SEQ ID NO: 1-SEQ ID NO: 81(nucleic acid sequences of SEQ ID NO: 82-SEQ ID NO: 162)) can have atleast 35%, at least 40%, at least 45%, at least 50%, at least 55%, atleast 60%, at least 65%, at least 70%, or at least 75% divergence fromSEQ ID NO: 163 (FokI).

In some embodiments, an endonuclease disclosed herein (e.g., SEQ ID NO:1-SEQ ID NO: 81 (nucleic acid sequences of SEQ ID NO: 82-SEQ ID NO:162)) can be fused to any nucleic acid binding domain disclosed hereinto form a non-naturally occurring fusion protein. This fusion proteincan have one or more of the following characteristics: (a) inducesgreater than 1% indels (insertions/deletions) at a target site; (b) thecleavage domain comprises a molecular weight of less than 23 kDa; (c)the cleavage domain comprises less than 196 amino acids; and (d) capableof cleaving across a spacer region greater than 24 base pairs. In someembodiments, the non-naturally occurring fusion protein can inducegreater than 5%, greater than 10%, greater than 20%, greater than 30%,greater than 40%, greater than 50%, greater than 60%, greater than 70%,greater than 80%, or greater than 90% indels at the target site. In someembodiments, indels are generated via the non-homologous end joining(NHEJ) pathway upon administration of a genome editing complex disclosedherein to a subject. Indels can be measured using deep sequencing,

DNA Binding Domains Fused to SEQ ID NO: 1-SEQ ID NO: 81 (Nucleic AcidSequences of SEQ ID NO: 82-SEQ ID NO: 162)

The present disclosure provides for novel compositions of endonucleaseswith modular nucleic acid binding domains (e.g., TALEs, RNBDs, orMAP-NBDs) described herein. In some instances the novel endonucleasescan be fused to a DNA binding domain from Xanthomonas spp. (TALE),Ralstonia (RNBD), or an animal pathogen (MAP-NBD) resulting in genomeediting complexes. A TALEN, RNBD-nuclease, or MAP-NBD-nuclease caninclude multiple components including the DNA binding domain, anoptional linker, and a repressor domain. The genome editing complexesdescribed herein can be used to selectively bind and cleave to a targetgene sequence for genome editing purposes. For example, a DNA bindingdomain from Xanthomonas, Ralstonia, or an animal pathogen of the presentdisclosure can be used to direct the binding of a genome editing complexto a desired genomic sequence.

The genome editing complexes described herein, comprising a DNA bindingdomain fused to an endonuclease, can be used to edit genomic loci ofinterest by binding to a target nucleic acid sequence via the DNAbinding domain and cleaving phosphodiester bonds of target doublestranded DNA via the endonuclease.

In some aspects, DNA binding domains fused to nucleases can create asite-specific double-stranded DNA break when fused to a nuclease. Suchbreaks can then be subsequently repaired by cellular machinery, througheither homology-dependent repair or non-homologous end joining (NHEJ).Genome editing, using DNA binding domains fused to nucleases describedherein—can thus be used to delete a sequence of interest (e.g., anaberrantly expressed or mutated gene) or to introduce a nucleic acidsequence of interest (e.g., a functional gene). DNA binding domains ofthe present disclosure can be programmed to delivery virtually anynuclease, including those disclosed herein, to any target site fortherapeutic purposes, including ex vivo engineered cell therapiesobtained using the compositions disclosed herein or gene therapy bydirect in vivo administration of the compositions disclosed herein. Inaddition, the DNA binding domain can bind to specific DNA sequences andin some cases they can activate the expression of host genes. In someinstances, the disclosure provides for enzymes, e.g., SEQ ID NO: 1-SEQID NO: 81 (or any one of nucleic acid sequences of SEQ ID NO: 82-SEQ IDNO: 162) that can be fused to the DNA binding domains of TALEs, RNBDs,and MAP-NBDs. In some instances, enzymes of the disclosure, includingSEQ ID NO: 1 (nucleic acid sequence of SEQ ID NO: 82), SEQ ID NO: 4(nucleic acid sequence of SEQ ID NO: 85), and SEQ ID NO: 8 (nucleic acidsequence of SEQ ID NO: 89), can achieve greater than 30% indels via theNHEJ pathway on a target gene when fused to a DNA binding domain of aTALE, RNBD, and MAP-NBD.

A non-naturally occurring fusion protein of the disclosure, e.g., anyone of SEQ ID NO: 1-SEQ ID NO: 81 (or any one of nucleic acid sequencesof SEQ ID NO: 82-SEQ ID NO: 162) fused to a DNA binding domain, cancomprise a repeat unit. A repeat unit can be from a wild-typeDNA-binding domain (Ralstonia solanacearum, Xanthomonas spp., or ananimal pathogen) or a modified repeat unit enhanced for specificrecognition of a particular nucleic acid base. A modified repeat unitcan comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25 or more mutationsthat can enhance the repeat module for specific recognition of aparticular nucleic acid base. In some embodiments, a modified repeatunit is modified at amino acid position 2, 3, 4, 11, 12, 13, 21, 23, 24,25, 26, 27, 28, 30, 31, 32, 33, 34, or 35. In some embodiments, amodified repeat unit is modified at amino acid positions 12 or 13.

As described in further detail below, a non-naturally occurring fusionprotein of the disclosure, e.g., anyone of SEQ ID NO: 1-SEQ ID NO: 81(or any one of nucleic acid sequences of SEQ ID NO: 82-SEQ ID NO: 162)fused to a plurality of repeat units (e.g., derived from Ralstoniasolanacearum, Xanthomonas spp., or an animal pathogen), can furthercomprise a C-terminal truncation, which can served as a linker betweenthe DNA binding domain and the nuclease.

A non-naturally occurring fusion protein of the disclosure, e.g., anyoneof SEQ ID NO: 1-SEQ ID NO: 81 (or any one of nucleic acid sequences ofSEQ ID NO: 82-SEQ ID NO: 162) fused to a DNA binding domain, can furthercomprise an N-terminal cap as described in further detail below. AnN-terminal cap can be a polypeptide portion flanking the DNA-bindingrepeat unit. An N-terminal cap can be any length and can comprise from 0to 136 amino acid residues in length. An N-terminal cap can be 5, 10,15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 110, 120, or 130amino acid residues in length. In some embodiments, an N-terminal capcan modulate structural stability of the DNA-binding repeat units. Insome embodiments, an N-terminal cap can modulate nonspecificinteractions. In some cases, an N-terminal cap can decrease nonspecificinteraction. In some cases, an N-terminal cap can reduce off-targeteffect. As used here, off-target effect refers to the interaction of agenome editing complex with a sequence that is not the target bindingsite of interest. An N-terminal cap can further comprise a wild-typeN-terminal cap sequence of a protein from Ralstonia solanacearum,Xanthomonas spp., or an animal pathogen or can comprise a modifiedN-terminal cap sequence.

In some embodiments, a DNA binding domain comprises at least one repeatunit having a repeat variable diresidue (RVD), which contacts a targetnucleic acid base. In some embodiments, a DNA binding domain comprisesmore than one repeat unit, each having an RVD, which contacts a targetnucleic acid base. In some embodiments, the DNA binding domain comprises1 to 50 RVDs. In some embodiments, the DNA binding domain components ofthe fusion proteins can be at least 14 RVDs, at least 15 RVDs, at least16 RVDs, at least 17 RVDs, at least 18 RVDs, at least 19 RVDs, at least20 RVDs in length, or at least 21 RVDs in length. In some embodiments,the DNA binding domains can be 16 to 21 RVDs in length.

In some embodiments, any one of the DNA binding domains described hereincan bind to a region of interest of any gene. For example, the DNAbinding domains described herein can bind upstream of the promoterregion, upstream of the gene transcription start site, or downstream ofthe transcription start site. In certain embodiments, the DNA bindingdomain binding region is no farther than 50 base pairs downstream of thetranscription start site. In some embodiments, the DNA binding domain isdesigned to bind in proximity to the transcription start site (TSS). Inother embodiments, the TALE can be designed to bind in the 5′ UTRregion.

A DNA binding domain described herein can comprise between 1 to 50repeat units. A DNA binding domain described herein can comprise between5 and 45, between 8 to 45, between 10 to 40, between 12 to 35, between15 to 30, between 20 to 30, between 8 to 40, between 8 to 35, between 8to 30, between 10 to 35, between 10 to 30, between 10 to 25, between 10to 20, or between 15 to 25 repeat units.

A DNA binding domain described herein can comprise at least 1, 2, 3, 4,5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 45,50, or more repeat units. A DNA binding domain described herein cancomprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,37, 38, 39, 40, 45, or 50 repeat units. A DNA binding domain describedherein can comprise 5 repeat units. A DNA binding domain describedherein can comprise 10 repeat units. A DNA binding domain describedherein can comprise 11 repeat units. A DNA binding domain describedherein can comprise 12 repeat units, or another suitable number.

A repeat unit of a DNA binding domain can be 18, 19, 20, 21, 22, 23, 24,25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36 37, 38, 39 or 40 residuesin length.

In some embodiments, the effector can be a protein secreted fromXanthomonas or Ralstonia bacteria upon plant infection. In someembodiments, the effector can be a protein that is a mutated form of, orotherwise derived from, a protein secreted from Xanthomonas or Ralstoniabacteria. The effector can further comprise a DNA-binding module whichincludes a variable number of about 33-35 amino acid residue repeatunits. Each amino acid repeat unit recognizes one base pair through twoadjacent amino acids (e.g., at amino acid positions 12 and 13 of therepeat unit). As such, amino acid positions 12 and 13 of the repeat unitcan also be referred to as repeat variable diresidue (RVD).

Linkers

A nuclease, e.g., anyone of SEQ ID NO: 1-SEQ ID NO: 81 (or any one ofnucleic acid sequences of SEQ ID NO: 82-SEQ ID NO: 162) fused to a DNAbinding domain (e.g., an RNBD, a MAP-NBD, a TALE), can further include alinker connecting SEQ ID NO: 1-SEQ ID NO: 81 (or any one of nucleic acidsequences of SEQ ID NO: 82-SEQ ID NO: 162) to the DNA binding domain. Alinker used herein can be a short flexible linker comprising 0 basepairs, 3 to 6 base pairs, 6 to 12 base pairs, 12 to 15 base pairs, 15 to21 base pairs, 21 to 24 base pairs, 24 to 30 base pairs, 30 to 36 basepairs, 36 to 42 base pairs, 42 to 48 base pairs, or 1-48 base pairs. Thenucleic acid sequence of the linker can encode for an amino acidsequence comprising 0 residues, 1-3 residues, 4-7 residues, 8-10residues, 10-12 residues, 12-15 residues, or 1-15 residues. Linkers caninclude, but are not limited to, residues such as glycine, methionine,aspartic acid, alanine, lysine, serine, leucine, threonine, tryptophan,or any combination thereof.

When linking a repressor domain to an RNBD, MAP-NBD, or TALE, the linkercan have a nucleic acid sequence ofGGCGGTGGCGGAGGGATGGATGCTAAGTCACTAACTGCCTGGTCC (SEQ ID NO: 165) and anamino acid sequence of GGGGGMDAKSLTAWS (SEQ ID NO: 166).

A nuclease, e.g., anyone of SEQ ID NO: 1-SEQ ID NO: 81 (or any one ofnucleic acid sequences of SEQ ID NO: 82-SEQ ID NO: 162) can be connectedto a DNA binding domain via a linker, a linker can be between 1 to 70amino acid residues in length. A linker can be from 5 to 45, from 5 to40, from 5 to 35, from 5 to 30, from 5 to 25, from 5 to 20, from 5 to15, from 10 to 40, from 10 to 35, from 10 to 30, from 10 to 25, from 10to 20, from 12 to 40, from 12 to 35, from 12 to 30, from 12 to 25, from12 to 20, from 14 to 40, from 14 to 35, from 14 to 30, from 14 to 25,from 14 to 20, from 14 to 16, from 15 to 40, from 15 to 35, from 15 to30, from 15 to 25, from 15 to 20, from 15 to 18, from 18 to 40, from 18to 35, from 18 to 30, from 18 to 25, from 18 to 24, from 20 to 40, from20 to 35, from 20 to 30, from 25 to 30, from 25 to 70, from 30 to 70,from 5 to 70, from 35 to 70, from 40 to 70, from 45 to 70, from 50 to70, from 55 to 70, from 60 to 70, or from 65 to 70 amino acid residuesin length.

A linker for linking a nuclease, e.g., anyone of SEQ ID NO: 1-SEQ ID NO:81 (or any one of nucleic acid sequences of SEQ ID NO: 82-SEQ ID NO:162) to a DNA binding domain can be 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,30, 35, 40, 45, 50, 55, 60, 65, or 70 amino acid residues in length.

In some embodiments, the linker can be the N-terminus of a naturallyoccurring Ralstonia solanacearum-derived protein, Xanthomonasspp.-derived protein, or Legionella quateirensis-derived protein,wherein any functional domain disclosed herein is fused to theN-terminus of the engineered DNA binding domain. In some embodiments,the linker comprising the N-terminus can comprise the full lengthnaturally occurring N-terminus of a naturally occurring Ralstoniasolanacearum-derived protein, Xanthomonas spp.-derived protein, orLegionella quateirensis-derived protein, or a truncation of thenaturally occurring N-terminus, such as amino acid residues at positions1 to 137 of the naturally occurring Ralstonia solanacearum-derivedprotein N-terminus (e.g., SEQ ID NO: 264), positions 1 (H) to 115 (S) ofthe naturally occurring Ralstonia solanacearum-derived proteinN-terminus (SEQ ID NO: 320), positions 1 (N) to 115 (S) of the naturallyoccurring Xanthomonas spp.-derived protein N-terminus (SEQ ID NO: 321),or positions 1 (G) to 115 (K) of the naturally occurring Legionellaquateirensis-derived protein N-terminus (SEQ ID NO: 322). In someembodiments, the linker can comprise amino acid residues at positions 1to 120 of the naturally occurring Ralstonia solanacearum-derived protein(SEQ ID NO: 303), Xanthomonas spp.-derived protein (SEQ ID NO: 301), orLegionella quateirensis-derived protein (SEQ ID N): 304). In someembodiments, the linker can comprise the naturally occurring N-terminusof Ralstonia solanacearum truncated to any length. For example, thenaturally occurring N-terminus of Ralstonia solanacearum can betruncated to amino acid residues at positions 1 to 120, 1 to 115, 1 to50, 1 to 70, 1 to 100, 1 to 120, 1 to 130, 10 to 40, 60 to 100, or 100to 120 and used at the N-terminus of the engineered DNA binding domainas a linker to a nuclease or a repressor.

In other embodiments, the linker can be the C-terminus of a naturallyoccurring Ralstonia solanacearum-derived protein, Xanthomonasspp.-derived protein, or animal pathogen-derived protein, wherein anyfunctional domain disclosed herein is fused to the C-terminus of theengineered DNA binding domain. In some embodiments, the linkercomprising the C-terminus can comprise the full length naturallyoccurring C-terminus of a naturally occurring Ralstoniasolanacearum-derived protein, Xanthomonas spp.-derived protein, oranimal pathogen-derived protein, or a truncation of the naturallyoccurring C-terminus, such as positions 1 to 63 of the naturallyoccurring Ralstonia solanacearum-derived protein (SEQ ID NO: 266),Xanthomonas spp.-derived protein (SEQ ID NO: 298), or Legionellaquateirensis-derived protein (SEQ ID NO: 306). In some embodiments, thenaturally occurring C-terminus of Ralstonia solanacearum-derivedprotein, Xanthomonas spp.-derived protein, or Legionellaquateirensis-derived protein can be truncated to any length and used atthe C-terminus of the engineered DNA binding domain and used as a linkerto a nuclease or repressor. For example, the naturally occurringC-terminus of Ralstonia solanacearum-derived protein, Xanthomonasspp.-derived protein, or Legionella quateirensis-derived protein can betruncated to amino acid residues at positions 1 to 63, 1 to 50, 1 to 70,1 to 100, 1 to 120, 1 to 130, 10 to 40, 60 to 100, or 100 to 120 andused at the C-terminus of the engineered DNA binding domain.

Functional Domains

An RNBD (e.g., Ralstonia solanacearum-derived), or another bindingdomain (e.g., MAP-NBD or TALE), can be linked to a functional domain.The functional domain can provide different types activity, such asgenome editing, gene regulation (e.g., activation or repression), orvisualization of a genomic locus via imaging.

A. Genome Editing Domains

For example, an RNBD (e.g., Ralstonia solanacearum-derived), or anotherbinding domain (e.g., MAP-NBD or TALE), can be linked to a nuclease,wherein the RNBD provides specificity and targeting and the nucleaseprovides genome editing functionality. In some embodiments, the nucleasecan be a cleavage domain, which dimerizes with another copy of the samecleavage domain to form an active full domain capable of cleaving DNA.In other embodiments, the nuclease can be a cleavage domain, which iscapable of cleaving DNA without needing to dimerize. For example, anuclease comprising a cleavage domain can be an endonuclease, such asFokI or Bfil. In some embodiments, two cleavage domains (e.g., FokI orBfil) can be fused together to form a fully functional single cleavagedomain. When cleavage domains are used as the nuclease, two RNBDs can beengineered, the first RNBD binding to a top strand of a target nucleicacid sequence and comprising a first FokI cleavage domain and a secondRNBD binding to a bottom strand of a target nucleic acid sequence andcomprising a second FokI cleavage domain.

In some embodiments, a fully functional cleavage domain, capable ofcleaving DNA without needing to dimerize include meganucleases, alsoreferred to as homing endonucleases. For example, a meganuclease caninclude I-AniI or I-OnuI. In some embodiments, the nuclease can be atype IIS restriction enzyme, such as FokI or Bfil.

A nuclease domain fused to an RNBD (e.g., Ralstoniasolanacearum-derived), or another binding domain (e.g., MAP-NBD orTALE), can be an endonuclease or an exonuclease. An endonuclease caninclude restriction endonucleases and homing endonucleases. Anendonuclease can also include S 1 Nuclease, mung bean nuclease,pancreatic DNase I, micrococcal nuclease, or yeast HO endonuclease. Anexonuclease can include a 3′-5′ exonuclease or a 5′-3′ exonuclease. Anexonuclease can also include a DNA exonuclease or an RNA exonuclease.Examples of exonuclease includes exonucleases I, II, III, IV, V, andVIII; DNA polymerase I, RNA exonuclease 2, and the like.

A nuclease domain fused to an RNBD (e.g., Ralstoniasolanacearum-derived), or another binding domain (e.g., MAP-NBD orTALE), can be a restriction endonuclease (or restriction enzyme). Insome instances, a restriction enzyme cleaves DNA at a site removed fromthe recognition site and has a separate binding and cleavage domains. Insome instances, such restriction enzyme is a Type IIS restrictionenzyme.

A nuclease domain fused to an RNBD (e.g., Ralstoniasolanacearum-derived), or another binding domain (e.g., MAP-NBD orTALE), can be a Type IIS nuclease. A Type IIS nuclease can be FokI orBfil. In some cases, a nuclease domain fused to an RNBD (e.g., Ralstoniasolanacearum-derived) is FokI. In other cases, a nuclease domain fusedto an RNBD (e.g., Ralstonia solanacearum-derived) is Bfil.

FokI can be a wild-type FokI or can comprise one or more mutations. Insome cases, FokI can comprise 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or moremutations. A mutation can enhance cleavage efficiency. A mutation canabolish cleavage activity. In some cases, a mutation can modulatehomodimerization. For example, FokI can have a mutation at one or moreamino acid residue positions 446, 447, 479, 483, 484, 486, 487, 490,491, 496, 498, 499, 500, 531, 534, 537, and 538 to modulatehomodimerization.

In some instances, a FokI cleavage domain is, for example, as describedin Kim et al. “Hybrid restriction enzymes: Zinc finger fusions to Fok Icleavage domain,” PNAS 93: 1156-1160 (1996), which is incorporatedherein by reference in its entirety. In some cases, a FokI cleavagedomain described herein has a sequence as follows:QLVKSELEEKKSELRHKLKYVPHEYIELIEIARNSTQDRILEMKVMEFFMKVYGYRGKHLGGSRKPDGAIYTVGSPIDYGVIVDTKAYSGGYNLPIGQADEMQRYVEENQTRNKHINPNEWWKVYPSSVTEFKFLFVSGHFKGNYKAQLTRLNHITNCNGAVLSVEELLIGGEMIKAGTLTLEEVRRKFNNGEINF (SEQ ID NO: 163). In other instances, a FokI cleavagedomain described herein is a FokI, for example, as described in U.S.Pat. No. 8,586,526, which is incorporated herein by reference in itsentirety.

An RNBD (e.g., Ralstonia solanacearum-derived) can be linked to afunctional group that modifies DNA nucleotides, for example an adenosinedeaminase.

In some embodiments, an RNBD (e.g., Ralstonia solanacearum-derived) canbe linked to any nuclease as set forth in TABLE 7 showing exemplaryamino acid sequences (SEQ ID NO: 1-SEQ ID NO: 81) of endonucleases forgenome editing and the corresponding back-translated nucleic acidsequences (SEQ ID NO: 82-SEQ ID NO: 162) of the endonucleases.

For purposes of gene editing, a first DNA binding domain (e.g., of aTALE, RNBD, or MAP-NBD) linked to a cleavage domain and a second DNAbinding domain (e.g., of a TALE, RNBD, or MAP-NBD) linked to a cleavagedomain can be provided. The first DNA binding domain (e.g., of a TALE,RNBD, or MAP-NBD) linked to a cleavage domain can recognize a top strandof double stranded DNA and bind to said region of double stranded DNA.The second DNA binding domain (e.g., of a TALE, RNBD, or MAP-NBD) linkedto a cleavage domain can recognize a separate, non-overlapping bottomstrand of double stranded DNA and bind to said region of double strandedDNA. The target nucleic acid sequence on the bottom strand can have itscomplementary nucleic acid sequence in the top strand positioned 10 to20 nucleotides towards the 3′ end from the first region. In someembodiments this stretch of 10 to 20 nucleotides can be referred to asthe spacer region. In some embodiments, this first DNA binding domain(e.g., of a TALE, RNBD, or MAP-NBD) linked to a cleavage domain and thesecond DNA binding domain (e.g., of a TALE, RNBD, or MAP-NBD) linked toa cleavage domain both bind at a target site, allowing for dimerizationof the two cleavage domains in the spacer region and allowing forcatalytic activity and cleaving of the target DNA.

a. Potency and Specificity of Genome Editing

In some embodiments, the efficiency of genome editing with a genomeediting complex of the present disclosure (e.g., any one of an RNBD,MAP-NBD, or TALE fused to any nuclease disclosed herein) can bedetermined. Specifically, the potency and specificity of the genomeediting complex can indicate whether a particular modular nucleic acidbinding domain fused to a nuclease provides efficient editing. Potencycan be defined as the percent indels (insertions/deletions) that aregenerated via the non-homologous end joining (NHEJ) pathway at a targetsite after administering a modular nucleic acid binding domain fused toa nuclease to a subject. A modular nucleic acid binding domain can havea potency of greater than 50%, greater than 55%, greater than 60%,greater than 65%, greater than 70%, greater than 75%, greater than 80%,greater than 85%, greater than 90%, greater than 95%, greater than 92%,greater than 95%, greater than 97%, or greater than 99%. A modularnucleic acid binding domain can have a potency of from 50% to 100%, 50%to 60%, 60% to 70%, 70% to 80%, 80% to 90%, or 90% to 100%.

Specificity can be defined as a specificity ratio, wherein the ratio isthe percent indels at a target site of interest over the percent indelsat the top-ranked off-target site for a particular genome editingcomplex (e.g., any DNA binding domain linked to a nuclease describedherein) of interest. A high specificity ratio would indicate that amodular nucleic acid binding domain fused to a nuclease edits primarilyat the desired target site and exhibits fewer instances of undesirable,off-target editing. A low specificity ratio would indicate that amodular nucleic acid binding domain fused to a nuclease does not editefficiently at the desired target site and/or can indicate that themodular nucleic acid binding domain fused to a nuclease exhibits highoff-target activity. A modular nucleic acid binding domain can have aspecificity ratio for the target site of at least 50:1, 55:1, 60:1,65:1, 70:1, 75:1, 80:1, 85:1, 90:1, 92:1, 95:1, 97:1, 99:1, 50:2, 55:2,60:2, 65:2, 70:2, 75:2, 80:2, 85:2, 90:2, 92:2, 95:2, 97:2, 99:2, 50:3,55:3, 60:3, 65:3, 70:3, 75:3, 80:3, 85:3, 90:3, 92:3, 95:3, 97:3, 99:3,50:4, 55:4, 60:4, 65:4, 70:4, 75:4, 80:4, 85:4, 90:4, 92:4, 95:4, 97:4,99:4, 50:5, 55:5, 60:5, 65:5, 70:5, 75:5, 80:5, 85:5, 90:5, 92:5, 95:5,97:5, or 99:5. Percent indels generated via non-homologous end joining(NHEJ) can be measured via deep sequencing techniques.

In some embodiments, the composition further comprises a cleavage domainlinked to the modular nucleic acid binding domain to form anon-naturally occurring fusion protein. In some aspects, the modularnucleic acid binding domain comprises a potency for a target sitegreater than 65% and a specificity ratio for the target site of 50:1;and a functional domain; wherein the modular nucleic acid binding domaincomprises a plurality of repeat units, wherein at least one repeat unitof the plurality comprises a binding region configured to bind to atarget nucleic acid base in the target site, wherein the potencycomprises indel percentage at the target site, and wherein thespecificity ratio comprises indel percentage at the target site overindel percentage at a top-ranked off-target site of the non-naturallyoccurring fusion protein.

In some embodiments, the repeat unit comprises a sequence ofA₁₋₁₁X₁X₂B₁₄₋₃₅ (SEQ ID NO: 448), wherein each amino acid residue ofA₁₋₁₁ comprises any amino acid residue; wherein X₁X₂ comprises thebinding region; wherein each amino acid residue of B₁₄₋₃₅ comprises anyamino acid; and wherein a first repeat unit of the plurality of repeatunits comprises at least one residue in A₁₋₁₁, B₁₄₋₃₅, or a combinationthereof that differs from a corresponding residue in a second repeatunit of the plurality of repeat units.

In some embodiments, the binding region comprises an amino acid residueat position 13 or an amino acid residue at position 12 and the aminoacid residue at position 13. In further aspects, the amino acid residueat position 13 binds to the target nucleic acid base. In still furtheraspects, the amino acid residue at position 12 stabilizes theconfiguration of the binding region. In some aspects, the indelpercentage is measured by deep sequencing. In some aspects, the modularnucleic acid binding domain further comprises one or more propertiesselected from the following: (a) binds the target site, wherein thetarget site comprises a 5′ guanine; (b) comprises from 7 repeat units to25 repeat units; and (c) upon binding to the target site, the modularnucleic acid binding domain is separated from a second modular nucleicacid binding domain bound to a second target site by from 2 to 50 basepairs.

The top-ranked off-target site for a composition (e.g., a modularnucleic acid binding domain linked to a cleavage domain) can bedetermined using the predicted report of genome-wide nuclease off-targetsites (PROGNOS) ranking algorithms as described in Fine et al. (NucleicAcids Res. 2014 April; 42(6):e42. doi: 10.1093/nar/gkt1326. Epub 2013Dec. 30.). As described in Fine et al, the PROGNOS algorithm TALEN v2.0can use the DNA target sequence as input; prior construction andexperimental characterization of the specific nucleases are notnecessary. Based on the differences between the sequence of a potentialoff-target site in the genome and the intended target sequence, thealgorithm can generate a score that is used to rank potential off-targetsites. If two (or more) potential off-target sites have equal scores,they can be further ranked by the type of genomic region annotated foreach site with the following order: Exon>Promoter>Intron>Intergenic. Afinal ranking by chromosomal location can be employed as a tie-breakerto ensure consistency in the ranking order. Thus, a score can begenerated for each potential off-target site.

B. Regulatory Domains

As another example, an RNBD (e.g., Ralstonia solanacearum-derived), oranother binding domain (e.g., MAP-NBD or TALE), can be linked to a generegulating domain. A gene regulation domain can be an activator or arepressor. For example, an RNBD (e.g., Ralstonia solanacearum-derived),or another binding domain (e.g., MAP-NBD or TALE), can be linked to anactivation domain, such as VP16, VP64, p65, p300 catalytic domain, TET1catalytic domain, TDG, Ldb1 self-associated domain, SAM activator (VP64,p65, HSF1), or VPR (VP64, p65, Rta). Alternatively, an RNBD (e.g.,Ralstonia solanacearum-derived), or another binding domain (e.g.,MAP-NBD or TALE), can be linked to a repressor, such as KRAB, Sin3a,LSD1, SUV39H1, G9A (EHMT2), DNMT1, DNMT3A-DNMT3L, DNMT3B, KOX,TGF-beta-inducible early gene (TIEG), v-erbA, SID, MBD2, MBD3, Rb, orMeCP2.

In some embodiments, an RNBD (e.g., Ralstonia solanacearum-derived), oranother binding domain (e.g., MAP-NBD or TALE), can be linked to a DNAmodifying protein, such as DNMT3a. An RNBD (e.g., Ralstoniasolanacearum-derived), or another binding domain (e.g., MAP-NBD orTALE), can be linked to a chromatin-modifying protein, such aslysine-specific histone demethylase 1 (LSD1). An RNBD (e.g., Ralstoniasolanacearum-derived), or another binding domain (e.g., MAP-NBD orTALE), can be linked to a protein that is capable of recruiting otherproteins, such as KRAB. The DNA modifying protein (e.g., DNMT3a) andproteins capable of recruiting other proteins (e.g., KRAB) can serve asrepressors of transcription. Thus, RNBDs (e.g., Ralstoniasolanacearum-derived), or another binding domain (e.g., MAP-NBD orTALE), linked to a DNA modifying protein (e.g., DNMT3a) or a domaincapable of recruiting other proteins (e.g., KRAB, a domain found intranscriptional repressors, such as Kox1) can provide gene repressionfunctionality, can serve as transcription factors, wherein the RNBD(e.g., Ralstonia solanacearum-derived), or another binding domain (e.g.,MAP-NBD or TALE), provides specificity and targeting and the DNAmodifying protein and the protein capable of recruiting other proteinsprovides gene repression functionality, which can be referred to as aTALE-transcription factor (TALE-TF), RNBD-transcription factor(RNBD-TF), or MAP-NBD-transcription factor (MAP-NBD-TF).

In some embodiments, expression of the target gene can be reduced by atleast 5%, at least 10%, at least 15%, at least 20%, at least 25%, atleast 30%, at least 35%, at least 40%, at least 45%, at least 50%, atleast 55%, at least 60%, at least 65%, at least 70%, at least 75%, atleast 80%, at least 85%, at least 90%, at least 92%, at least 95%, atleast 97%, or at least 99% by using a DNA binding domain fused to arepression domain (e.g., an RNBD-TF, a MAP-NBD-TF, or TALE-TF) of thepresent disclosure as compared to non-treated cells. In someembodiments, expression of the target gene can be reduced by 5% to 10%,10% to 15%, 15% to 20%, 20%, to 25%, 25% to 30%, 30% to 35%, 35% to 40%,40% to 45%, 45% to 50%, 50% to 55%, 55% to 60%, 60% to 65%, 65% to 70%,70% to 75%, 75% to 80%, 80% to 85%, 85% to 90%, 90% to 95%, or 95% to99% by using an RNBD-TF, a MAP-NBD-TF, or TALE-TF of the presentdisclosure as compared to non-treated cells. In some embodiments,expression of the checkpoint gene can be reduced by over 90% by using anRNBD-TF, a MAP-NBD-TF, or TALE-TF of the present disclosure as comparedto non-treated cells.

In some embodiments, repression of the target gene with a DNA bindingdomain fused to a repression domain (e.g., an RNBD-TF, a MAP-NBD-TF, orTALE-TF) of the present disclosure and subsequent reduced expression ofthe target gene can last for at least 1 day, at least 2 days, at least 3days, at least 4 days, at least 5 days, at least 6 days, at least 7days, at least 8 days, at least 9 days, at least 10 days, at least 11days, at least 12 days, at least 13 days, at least 14 days, at least 15days, at least 16 days, at least 17 days, at least 18 days, at least 19days, at least 20 days, at least 21 days, at least 22 days, at least 23days, at least 24 days, at least 25 days, at least 26 days, at least 27days, or at least 28 days. In some embodiments, repression of the targetgene with an RNBD-TF, a MAP-NBD-TF, or TALE-TF of the present disclosureand subsequent reduced expression of the target gene can last for 1 daysto 3 days, 3 days to 5 days, 5 days to 7 days, 7 days to 9 days, 9 daysto 11 days, 11 days to 13 days, 13 days to 15 days, 15 days to 17 days,17 days to 19 days, 19 days to 21 days, 21 days to 23 days, 23 days to25 days, or 25 days to 28 days.

In various aspects, the present disclosure provides a method ofidentifying a target binding site in a target gene of a cell, the methodcomprising: (a) contacting a cell with an engineered genomic regulatorycomplex comprising a DNA binding domain, a repressor domain, and alinker; (b) measuring expression of the target gene; and (c) determiningexpression of the target gene is repressed by at least 50%, at least60%, at least 70%, at least 80%, at least 85%, at least 90%, at least92%, at least 95%, at least 97%, or at least 99% for at least 3 days,wherein the target gene is selected from: a checkpoint gene and a T cellsurface receptor.

In some aspects, expression of the target gene is repressed in at least75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least99% of a plurality of the cells. In some aspects, the engineered genomicregulatory complex is undetectable after at least 3 days. In someaspects, determining the engineered genomic regulatory complex isundetectable is measured by qPCR, imaging of a FLAG-tag, or acombination thereof. In some aspects, the measuring expression of thetarget gene comprises flow cytometry quantification of expression of thetarget gene.

In some embodiments, repression of the target gene with a DNA bindingdomain fused to a repression domain (e.g., an RNBD-TF, a MAP-NBD-TF, orTALE-TF) of the present disclosure can last even after the DNA bindingdomain-gene regulator becomes undetectable. The DNA binding domain fusedto a repression domain (e.g., an RNBD-TF, a MAP-NBD-TF, or TALE-TF) canbecome undetectable after at least 3 days. In some embodiments, the DNAbinding domain fused to a repression domain (e.g., an RNBD-TF, aMAP-NBD-TF, or TALE-TF) can become undetectable after at least 1 day, atleast 2 days, at least 3 days, at least 4 days, at least 5 days, atleast 6 days, at least 1 week, at least 2 weeks, at least 3 weeks, or atleast 4 weeks. In some embodiments, qPCR or imaging via the FLAG-tag canbe used to confirm that the DNA binding domain fused to a repressiondomain (e.g., an RNBD-TF, a MAP-NBD-TF, or TALE-TF) is no longerdetectable.

C. Imaging Moieties

An RNBD (e.g., Ralstonia solanacearum-derived), or another bindingdomain (e.g., MAP-NBD or TALE), can be linked to a fluorophore, such asHydroxycoumarin, methoxycoumarin, Alexa fluor, aminocoumarin, Cy2, FAM,Alexa fluor 488, Fluorescein FITC, Alexa fluor 430, Alexa fluor 532,HEX, Cy3, TRITC, Alexa fluor 546, Alexa fluor 555, R-phycoerythrin (PE),Rhodamine Red-X, Tamara, Cy3.5, Rox, Alexa fluor 568, Red 613, TexasRed, Alexa fluor 594, Alexa fluor 633, Allophycocyanin, Alexa fluor 633,Cy5, Alexa fluor 660, Cy5.5, TruRed, Alexa fluor 680, Cy7, GFP, ormCHERRY. An RNBD (e.g., Ralstonia solanacearum-derived) can be linked toa biotinylation reagent.

Genes and Indications of Interest

In some embodiments, genome editing can be performed by fusing anuclease of the present disclosure with a DNA binding domain for aparticular genomic locus of interest. Genetic modification can involveintroducing a functional gene for therapeutic purposes, knocking out agene for therapeutic gene, or engineering a cell ex vivo (e.g., HSCs orCAR T cells) to be administered back into a subject in need thereof. Forexample, the genome editing complex can have a target site within PDCD1,CTLA4, LAGS, TET2, BTLA, HAVCR2, CCR5, CXCR4, TRA, TRB, B2M, albumin,HBB, HBA1, TTR, NR3C1, CD52, erythroid specific enhancer of the BCL11Agene, CBLB, TGFBR1, SERPINA1, HBV genomic DNA in infected cells, CEP290,DMD, CFTR, IL2RG, CS-1, or any combination thereof. In some embodiments,a genome editing complex can cleave double stranded DNA at a target sitein order to insert a chimeric antigen receptor (CAR), alpha-Liduronidase (IDUA), iduronate-2-sulfatase (IDS), or Factor 9 (F9).Cells, such as hematopoietic stem cells (HSCs) and T cells, can beengineered ex vivo with the genome editing complex. Alternatively,genome editing complexes can be directly administered to a subject inneed thereof.

The subject receiving treatment can be suffering from a disease such astransthyretin amyloidosis (ATTR), HIV, glioblastoma multiforme, cancer,acute lymphoblastic leukemia, acute myeloid leukemia, beta-thalassemia,sickle cell disease, MPSI, MPSII, Hemophilia B, multiple myeloma,melanoma, sarcoma, Leber congenital amaurosis (LCA10), CD19malignancies, BCMA-related malignancies, duchenne muscular dystrophy(DMD), cystic fibrosis, alpha-1 antitrypsin deficiency, X-linked severecombined immunodeficiency (X-SCID), or Hepatitis B.

Samples for Analysis

In some aspects, described herein include methods of modifying thegenetic material of a target cell utilizing an RNBD described herein. Asample described herein may be a fresh sample. The sample may be a livesample.

The sample may be a cell sample. The cell sample may be obtained fromthe cells or tissue of an animal. The animal cell may comprise a cellfrom an invertebrate, fish, amphibian, reptile, or mammal. The mammaliancell may be obtained from a primate, ape, equine, bovine, porcine,canine, feline, or rodent. The mammal may be a primate, ape, dog, cat,rabbit, ferret, or the like. The rodent may be a mouse, rat, hamster,gerbil, hamster, chinchilla, or guinea pig. The bird cell may be from acanary, parakeet, or parrot. The reptile cell may be from a turtle,lizard, or snake. The fish cell may be from a tropical fish. Forexample, the fish cell may be from a zebrafish (such as Danio rerio).The amphibian cell may be from a frog. An invertebrate cell may be froman insect, arthropod, marine invertebrate, or worm. The worm cell may befrom a nematode (such as Caenorhabditis elegans). The arthropod cell maybe from a tarantula or hermit crab.

The cell sample may be obtained from a mammalian cell. For example, themammalian cell may be an epithelial cell, connective tissue cell,hormone secreting cell, a nerve cell, a skeletal muscle cell, a bloodcell, an immune system cell, or a stem cell. A cell may be a fresh cell,live cell, fixed cell, intact cell, or cell lysate. Cell samples can beany primary cell, such as a hematopoetic stem cell (HSCs) or naïve orstimulated T cells (e.g., CD4+ T cells).

Cell samples may be cells derived from a cell line, such as animmortalized cell line. Exemplary cell lines include, but are notlimited to, 293A cell line, 293FT cell line, 293F cell line, 293 H cellline, HEK 293 cell line, CHO DG44 cell line, CHO-S cell line, CHO-K1cell line, Expi293F™ cell line, Flp-In™ T-REx™ 293 cell line,Flp-In™-293 cell line, Flp-In™-3T3 cell line, Flp-In™-BHK cell line,Flp-In™-CHO cell line, Flp-In™-CV-1 cell line, Flp-In™-Jurkat cell line,FreeStyle™ 293-F cell line, FreeStyle™ CHO-S cell line, GripTite™ 293MSR cell line, GS-CHO cell line, HepaRG™ cell line, T-REx™ Jurkat cellline, Per.C6 cell line, T-REx™-293 cell line, T-REx™-CHO cell line,T-REx™-HeLa cell line, NC-HIMT cell line, PC12 cell line, A549 cells,and K562 cells.

In some embodiments, an RNBD of the present disclosure can be used tomodify a target cell. The target cell can itself be unmodified ormodified. For example, an unmodified cell can be edited with an RNBD ofthe present disclosure to introduce an insertion, deletion, or mutationin its genome. In some embodiments, a modified cell already having amutation can be repaired with an RNBD of the present disclosure.

In some instances, a target cell is a cell comprising one or more singlenucleotide polymorphism (SNP). In some instances, an RNBD-nucleasedescribed herein is designed to target and edit a target cell comprisinga SNP.

In some cases, a target cell is a cell that does not contain amodification. For example, a target cell can comprise a genome withoutgenetic defect (e.g., without genetic mutation) and an RNBD-nucleasedescribed herein can be used to introduce a modification (e.g., amutation) within the genome.

The cell sample may be obtained from cells of a primate. The primate maybe a human, or a non-human primate. The cell sample may be obtained froma human. For example, the cell sample may comprise cells obtained fromblood, urine, stool, saliva, lymph fluid, cerebrospinal fluid, synovialfluid, cystic fluid, ascites, pleural effusion, amniotic fluid,chorionic villus sample, vaginal fluid, interstitial fluid, buccal swabsample, sputum, bronchial lavage, Pap smear sample, or ocular fluid. Thecell sample may comprise cells obtained from a blood sample, an aspiratesample, or a smear sample.

The cell sample may be a circulating tumor cell sample. A circulatingtumor cell sample may comprise lymphoma cells, fetal cells, apoptoticcells, epithelia cells, endothelial cells, stem cells, progenitor cells,mesenchymal cells, osteoblast cells, osteocytes, hematopoietic stemcells (HSC) (e.g., a CD34+ HSC), foam cells, adipose cells,transcervical cells, circulating cardiocytes, circulating fibrocytes,circulating cancer stem cells, circulating myocytes, circulating cellsfrom a kidney, circulating cells from a gastrointestinal tract,circulating cells from a lung, circulating cells from reproductiveorgans, circulating cells from a central nervous system, circulatinghepatic cells, circulating cells from a spleen, circulating cells from athymus, circulating cells from a thyroid, circulating cells from anendocrine gland, circulating cells from a parathyroid, circulating cellsfrom a pituitary, circulating cells from an adrenal gland, circulatingcells from islets of Langerhans, circulating cells from a pancreas,circulating cells from a hypothalamus, circulating cells from prostatetissues, circulating cells from breast tissues, circulating cells fromcirculating retinal cells, circulating ophthalmic cells, circulatingauditory cells, circulating epidermal cells, circulating cells from theurinary tract, or combinations thereof.

The cell can be a T cell. For example, in some embodiments, the T cellcan be an engineered T cell transduced to express a chimeric antigenreceptor (CAR). The CAR T cell can be engineered to bind to BCMA, CD19,CD22, WT1, L1CAM, MUC16, ROR1, or LeY.

A cell sample may be a peripheral blood mononuclear cell sample.

A cell sample may comprise cancerous cells. The cancerous cells may forma cancer which may be a solid tumor or a hematologic malignancy. Thecancerous cell sample may comprise cells obtained from a solid tumor.The solid tumor may include a sarcoma or a carcinoma. Exemplary sarcomacell sample may include, but are not limited to, cell sample obtainedfrom alveolar rhabdomyosarcoma, alveolar soft part sarcoma,ameloblastoma, angiosarcoma, chondrosarcoma, chordoma, clear cellsarcoma of soft tissue, dedifferentiated liposarcoma, desmoid,desmoplastic small round cell tumor, embryonal rhabdomyosarcoma,epithelioid fibrosarcoma, epithelioid hemangioendothelioma, epithelioidsarcoma, esthesioneuroblastoma, Ewing sarcoma, extrarenal rhabdoidtumor, extraskeletal myxoid chondrosarcoma, extraskeletal osteosarcoma,fibrosarcoma, giant cell tumor, hemangiopericytoma, infantilefibrosarcoma, inflammatory myofibroblastic tumor, Kaposi sarcoma,leiomyosarcoma of bone, liposarcoma, liposarcoma of bone, malignantfibrous histiocytoma (MFH), malignant fibrous histiocytoma (MFH) ofbone, malignant mesenchymoma, malignant peripheral nerve sheath tumor,mesenchymal chondrosarcoma, myxofibrosarcoma, myxoid liposarcoma,myxoinflammatory fibroblastic sarcoma, neoplasms with perivascularepitheioid cell differentiation, osteosarcoma, parosteal osteosarcoma,neoplasm with perivascular epitheioid cell differentiation, periostealosteosarcoma, pleomorphic liposarcoma, pleomorphic rhabdomyosarcoma,PNET/extraskeletal Ewing tumor, rhabdomyosarcoma, round cellliposarcoma, small cell osteosarcoma, solitary fibrous tumor, synovialsarcoma, or telangiectatic osteosarcoma.

Exemplary carcinoma cell samples may include, but are not limited to,cell samples obtained from an anal cancer, appendix cancer, bile ductcancer (i.e., cholangiocarcinoma), bladder cancer, brain tumor, breastcancer, cervical cancer, colon cancer, cancer of Unknown Primary (CUP),esophageal cancer, eye cancer, fallopian tube cancer,gastroenterological cancer, kidney cancer, liver cancer, lung cancer,medulloblastoma, melanoma, oral cancer, ovarian cancer, pancreaticcancer, parathyroid disease, penile cancer, pituitary tumor, prostatecancer, rectal cancer, skin cancer, stomach cancer, testicular cancer,throat cancer, thyroid cancer, uterine cancer, vaginal cancer, or vulvarcancer.

The cancerous cell sample may comprise cells obtained from a hematologicmalignancy. Hematologic malignancy may comprise a leukemia, a lymphoma,a myeloma, a non-Hodgkin's lymphoma, or a Hodgkin's lymphoma. Thehematologic malignancy may be a T-cell based hematologic malignancy. Thehematologic malignancy may be a B-cell based hematologic malignancy.Exemplary B-cell based hematologic malignancy may include, but are notlimited to, chronic lymphocytic leukemia (CLL), small lymphocyticlymphoma (SLL), high risk CLL, a non-CLL/SLL lymphoma, prolymphocyticleukemia (PLL), follicular lymphoma (FL), diffuse large B-cell lymphoma(DLBCL), mantle cell lymphoma (MCL), Waldenström's macroglobulinemia,multiple myeloma, extranodal marginal zone B cell lymphoma, nodalmarginal zone B cell lymphoma, Burkitt's lymphoma, non-Burkitt highgrade B cell lymphoma, primary mediastinal B-cell lymphoma (PMBL),immunoblastic large cell lymphoma, precursor B-lymphoblastic lymphoma, Bcell prolymphocytic leukemia, lymphoplasmacytic lymphoma, splenicmarginal zone lymphoma, plasma cell myeloma, plasmacytoma, mediastinal(thymic) large B cell lymphoma, intravascular large B cell lymphoma,primary effusion lymphoma, or lymphomatoid granulomatosis. ExemplaryT-cell based hematologic malignancy may include, but are not limited to,peripheral T-cell lymphoma not otherwise specified (PTCL-NOS),anaplastic large cell lymphoma, angioimmunoblastic lymphoma, cutaneousT-cell lymphoma, adult T-cell leukemia/lymphoma (ATLL), blastic NK-celllymphoma, enteropathy-type T-cell lymphoma, hematosplenic gamma-deltaT-cell lymphoma, lymphoblastic lymphoma, nasal NK/T-cell lymphomas, ortreatment-related T-cell lymphomas.

A cell sample described herein may comprise a tumor cell line sample.Exemplary tumor cell line sample may include, but are not limited to,cell samples from tumor cell lines such as 600MPE, AU565, BT-20, BT-474,BT-483, BT-549, Evsa-T, Hs578T, MCF-7, MDA-MB-231, SkBr3, T-47D, HeLa,DU145, PC3, LNCaP, A549, H1299, NCI-H460, A2780, SKOV-3/Luc, Neuro2a,RKO, RKO-AS45-1, HT-29, SW1417, SW948, DLD-1, SW480, Capan-1, MC/9,B72.3, B25.2, B6.2, B38.1, DMS 153, SU.86.86, SNU-182, SNU-423, SNU-449,SNU-475, SNU-387, Hs 817.T, LMH, LMH/2A, SNU-398, PLHC-1, HepG2/SF,OCI-Ly1, OCI-Ly2, OCI-Ly3, OCI-Ly4, OCI-Ly6, OCI-Ly7, OCI-Ly10,OCI-Ly18, OCI-Ly19, U2932, DB, HBL-1, RIVA, SUDHL2, TMD8, MEC1, MEC2,8E5, CCRF-CEM, MOLT-3, TALL-104, AML-193, THP-1, BDCM, HL-60, Jurkat,RPMI 8226, MOLT-4, RS4, K-562, KASUMI-1, Daudi, GA-10, Raji, JeKo-1,NK-92, and Mino.

A cell sample may comprise cells obtained from a biopsy sample, necropsysample, or autopsy sample.

The cell samples (such as a biopsy sample) may be obtained from anindividual by any suitable means of obtaining the sample usingwell-known and routine clinical methods. Procedures for obtaining tissuesamples from an individual are well known. For example, procedures fordrawing and processing tissue sample such as from a needle aspirationbiopsy are well-known and may be employed to obtain a sample for use inthe methods provided. Typically, for collection of such a tissue sample,a thin hollow needle is inserted into a mass such as a tumor mass forsampling of cells that, after being stained, will be examined under amicroscope.

A cell may be a live cell. A cell may be a eukaryotic cell. A cell maybe a yeast cell. A cell may be a plant cell. A cell may be obtained froman agricultural plant.

EXAMPLES

These examples are provided for illustrative purposes only and not tolimit the scope of the claims provided herein.

Example 1 Genome Editing Complexes and Gene Regulators with ExpandedRepeat Units

This example describes genome editing complexes and gene regulators withexpanded repeat units. DNA binding domains (e.g., RNBD, MAP-NBD, TALE)are engineered from a plurality of repeat units and fused to a nucleasedisclosed herein (e.g., FokI or SEQ ID NO: 1-SEQ ID NO: 81), anactivation domain (VP16, VP64, p65, p300 catalytic domain, TET1catalytic domain, TDG, Ldb1 self-associated domain, SAM activator (VP64,p65, HSF1), or VPR (VP64, p65, Rta), or a repression domain (e.g., KRAB,Sin3a, LSD1, SUV39H1, G9A (EHMT2), DNMT1, DNMT3A-DNMT3L, DNMT3B, KOX,TGF-beta-inducible early gene (TIEG), v-erbA, SID, MBD2, MBD3, Rb, orMeCP2). At least one repeat unit of the DNA binding domain has greaterthan 39 amino acid residues and binds to a target nucleotide. Theexpanded repeat unit has altered affinity for its target nucleotide. TheDNA binding domain with expanded repeat units exhibits altered bindingto a target gene.

Example 2 Genome Editing Complexes and Gene Regulators with ContractedRepeat Units

This example describes genome editing complexes and gene regulators withcontracted repeat units. DNA binding domains (e.g., RNBD, MAP-NBD, TALE)are engineered from a plurality of repeat units and fused to a nucleasedisclosed herein (e.g., FokI or SEQ ID NO: 1-SEQ ID NO: 81), anactivation domain (VP16, VP64, p65, p300 catalytic domain, TET1catalytic domain, TDG, Ldb1 self-associated domain, SAM activator (VP64,p65, HSF1), or VPR (VP64, p65, Rta), or a repression domain (e.g., KRAB,Sin3a, LSD1, SUV39H1, G9A (EHMT2), DNMT1, DNMT3A-DNMT3L, DNMT3B, KOX,TGF-beta-inducible early gene (TIEG), v-erbA, SID, MBD2, MBD3, Rb, orMeCP2). At least one repeat unit of the DNA binding domain has less than32 amino acid residues and binds to a target nucleotide. The contractedrepeat unit has altered affinity for its target nucleotide. The DNAbinding domain with contracted repeat units exhibits altered binding toa target gene. (e.g., RNBD, MAP-NBD, TALE) are engineered from aplurality of repeat units

Example 3 Genome Editing Complexes and Gene Regulators with GappedRepeat Units Having Recognition Sites

This example describes genome editing complexes and gene regulators withgapped repeat units having recognition sites. DNA binding domains (e.g.,RNBD, MAP-NBD, TALE) are engineered from a plurality of repeat units andfused via a linker to a nuclease disclosed herein (e.g., FokI or SEQ IDNO: 1-SEQ ID NO: 81), an activation domain (VP16, VP64, p65, p300catalytic domain, TET1 catalytic domain, TDG, Ldb1 self-associateddomain, SAM activator (VP64, p65, HSF1), or VPR (VP64, p65, Rta), or arepression domain (e.g., KRAB, Sin3a, LSD1, SUV39H1, G9A (EHMT2), DNMT1,DNMT3A-DNMT3L, DNMT3B, KOX, TGF-beta-inducible early gene (TIEG),v-erbA, SID, MBD2, MBD3, Rb, or MeCP2). Said linker has a recognitionsite for a small molecule, a protease, or a kinase or serves as alocalization signal. Said linker having a recognition site separatingeach repeat unit from a neighboring repeat unit within the DNA bindingdomain and are, thus, gapped. Engineered DNA binding domains with gappedrepeat units exhibit genome editing or gene regulation activity alongwith secondary activity.

Example 4 Genome Editing with DNA Binding Domain Comprising ExpandedRepeat Units and Fused to a Nuclease

This example illustrates genome editing with a DNA binding domaincomprising expanded repeat units and fused to a nuclease. A DNA bindingdomain (e.g., RNBD, MAP-NBD, TALE) in which at least one repeat unit hasgreater than 39 amino acid residues is fused to a cleavage domain, suchas an endonuclease to form a genome editing complex. The DNA bindingdomain is fused to the nuclease optionally, via a naturally occurringlinker, a variant or truncation of a naturally occurring linker, or asynthetic linker.

Direct Administration to Introduce a Gene

The genome editing complex is administered directly to a subject in needthereof and is taken up by a cell. The subject has a disease. The DNAbinding domain of the genome editing complex binds a region of DNA in atarget cell and the cleavage domain induces a double strand break in theDNA of the target cell to introduce a gene. The introduced gene is amutated gene or a functional gene.

Factor IX. The genome editing complex with a cleavage domain introducesa double strand break into the albumin gene locus (e.g., into intron 1)concomitant with delivery to the cell of an ectopic nucleic acid bearinga cDNA of the factor IX gene. The double strand break leads to theintegration of the ectopic nucleic acid into intron 1 of the albumingene; the factor IX protein is secreted by the cell into thecirculation. The target cell is a hepatocyte and the subject in needthereof has Hemophilia B.

Ex Vivo Engineering of a Cell to Introduce a Gene

The genome editing complex is transfected into cells ex vivo along withan ectopic nucleic acid bearing a gene. Upon transfection of cells exvivo, the DNA binding domain of the genome editing complex binds aregion of DNA in a target cell and the cleavage domain induces a doublestrand break in the DNA of the target cell to introduce an ectopicallyprovided gene (also provided to the cell) in the region cleaved by thegenome editing complex. The resulting engineered cells with modified DNAare administered to a subject in need thereof. The subject has adisease.

CAR. The genome editing complex with a cleavage domain introduces achimeric antigen receptor (CAR) by editing the DNA of a target cell. Thetarget cell is a T cell and the subject has cancer, such as a bloodcancer. Upon administration of the engineered cells to a subject, theengineered CAR T cells effectively eliminate cancer in the subject.

Direct Administration to Partially or Completely Knock Out a Gene

The genome editing complex is administered directly to a subject in needthereof and is taken up by a cell. The subject has a disease. The DNAbinding domain of the genome editing complex binds a region of DNA in atarget cell and the cleavage domain induces a double strand break in theDNA of the target cell to partially or completely knock out a gene.

TTR. The genome editing complex with a cleavage domain partially orcompletely knocks out the transthyretin (TTR) gene by editing the DNA ofa target cell. The target cell is a liver cell and the subject in needthereof has transthyretin amyloidosis (ATTR).

SERPINA1. The genome editing complex with a cleavage domain partially orcompletely knocks out the SERPINA1 gene by editing the DNA of a targetcell. The target cell is a liver cell and the subject in need thereofhas alpha-1 antitrypsin deficiency (dA1AT def).

Ex Vivo Engineering of a Cell to Partially or Completely Knock Out aGene or a Gene Regulatory Region

The genome editing complex is transfected in cells ex vivo. Upontransfection of cells ex vivo, the DNA binding domain of the genomeediting complex binds a region of DNA in a target cell and the cleavagedomain induces a double strand break in the DNA of the target cell topartially or completely knock out a gene or a gene regulatory region.The subject has a disease.

BCL11A Enhancer. The genome editing complex with a cleavage domainpartially or completely knocks out the BCL11A erythroid enhancer byediting the DNA of a target cell. The target cell is an HPSC and thesubject in need thereof has b-thalassemia or sickle cell disease.

CCR5. The genome editing complex with a cleavage domain partially orcompletely knocks the CCR5 gene by editing the DNA of a target cell,thereby allowing for introduction of a mutated version of CCR5. Targetcells, in which mutated versions of CCR5 are introduced via the actionof the genome editing complex, are not infected by HIV via the modifiedCCR5 receptor. The target cell is a T cell or a hematopoietic stem cell(HPSC) and the subject has HIV.

Upon administration of the genome editing complex directly to a subjector upon administration of an engineered cell with DNA that has beenmodified with the genome editing complex, the disease symptoms areeliminated or reduced.

Example 5 Genome Editing with a DNA Binding Domain Comprising ContractedRepeat Units and Fused to a Nuclease

This example illustrates genome editing with a DNA binding domaincomprising contracted repeat units and fused to a nuclease. A DNAbinding domain (e.g., RNBD, MAP-NBD, TALE) in which at least one repeatunit has less than 32 amino acid residues is fused to a cleavage domain,such as an endonuclease to form a genome editing complex. The DNAbinding domain is fused to the nuclease optionally, via a naturallyoccurring linker, a variant or truncation of a naturally occurringlinker, or a synthetic linker.

Direct Administration to Introduce a Gene

The genome editing complex is administered directly to a subject in needthereof and is taken up by a cell. The subject has a disease. The DNAbinding domain of the genome editing complex binds a region of DNA in atarget cell and the cleavage domain induces a double strand break in theDNA of the target cell to introduce a gene. The introduced gene is amutated gene or a functional gene.

Factor IX. The genome editing complex with a cleavage domain introducesa double strand break into the albumin gene locus (e.g., into intron 1)concomitant with delivery to the cell of an ectopic nucleic acid bearinga cDNA of the factor IX gene. The double strand break leads to theintegration of the ectopic nucleic acid into intron 1 of the albumingene; the factor IX protein is secreted by the cell into thecirculation. The target cell is a hepatocyte and the subject in needthereof has Hemophilia B.

Ex Vivo Engineering of a Cell to Introduce a Gene

The genome editing complex is transfected into cells ex vivo along withan ectopic nucleic acid bearing a gene. Upon transfection of cells exvivo, the DNA binding domain of the genome editing complex binds aregion of DNA in a target cell and the cleavage domain induces a doublestrand break in the DNA of the target cell to introduce an ectopicallyprovided gene (also provided to the cell) in the region cleaved by thegenome editing complex. The resulting engineered cells with modified DNAare administered to a subject in need thereof. The subject has adisease.

CAR. The genome editing complex with a cleavage domain introduces achimeric antigen receptor (CAR) by editing the DNA of a target cell. Thetarget cell is a T cell and the subject has cancer, such as a bloodcancer. Upon administration of the engineered cells to a subject, theengineered CAR T cells effectively eliminate cancer in the subject.

Direct Administration to Partially or Completely Knock Out a Gene

The genome editing complex is administered directly to a subject in needthereof and is taken up by a cell. The subject has a disease. The DNAbinding domain of the genome editing complex binds a region of DNA in atarget cell and the cleavage domain induces a double strand break in theDNA of the target cell to partially or completely knock out a gene.

TTR. The genome editing complex with a cleavage domain partially orcompletely knocks out the transthyretin (TTR) gene by editing the DNA ofa target cell. The target cell is a liver cell and the subject in needthereof has transthyretin amyloidosis (ATTR).

SERPINA1. The genome editing complex with a cleavage domain partially orcompletely knocks out the SERPINA1 gene by editing the DNA of a targetcell. The target cell is a liver cell and the subject in need thereofhas alpha-1 antitrypsin deficiency (dA1AT def).

Ex Vivo Engineering of a Cell to Partially or Completely Knock Out aGene or a Gene Regulatory Region

The genome editing complex is transfected in cells ex vivo. Upontransfection of cells ex vivo, the DNA binding domain of the genomeediting complex binds a region of DNA in a target cell and the cleavagedomain induces a double strand break in the DNA of the target cell topartially or completely knock out a gene or a gene regulatory region.The subject has a disease.

BCL11A Enhancer. The genome editing complex with a cleavage domainpartially or completely knocks out the BCL11A erythroid enhancer byediting the DNA of a target cell. The target cell is an HPSC and thesubject in need thereof has b-thalassemia or sickle cell disease.

CCR5. The genome editing complex with a cleavage domain partially orcompletely knocks the CCR5 gene by editing the DNA of a target cell,thereby allowing for introduction of a mutated version of CCR5. Targetcells, in which mutated versions of CCR5 are introduced via the actionof the genome editing complex, are not infected by HIV via the modifiedCCR5 receptor. The target cell is a T cell or a hematopoietic stem cell(HPSC) and the subject has HIV.

Upon administration of the genome editing complex directly to a subjector upon administration of an engineered cell with DNA that has beenmodified with the genome editing complex, the disease symptoms areeliminated or reduced.

Example 6 Genome Editing with DNA Binding Domain Having Gapped RepeatUnits and Fused to a Nuclease

This example illustrates genome editing DNA binding domains fused to anuclease, wherein the DNA binding domains have gapped repeat units. ADNA binding domain (e.g., RNBD, MAP-NBD, TALE) in which all repeat unitsare separated from neighboring repeat units with a linker comprising arecognition site is fused to a cleavage domain, such as an endonucleaseto form a genome editing complex. Said linker has a recognition site fora small molecule, a protease, or a kinase or serves as a localizationsignal. The DNA binding domain is fused to the nuclease optionally, viaa naturally occurring linker, a variant or truncation of a naturallyoccurring linker, or a synthetic linker.

Direct Administration to Introduce a Gene

The genome editing complex is administered directly to a subject in needthereof and is taken up by a cell. The subject has a disease. The DNAbinding domain of the genome editing complex binds a region of DNA in atarget cell and the cleavage domain induces a double strand break in theDNA of the target cell to introduce a gene. The introduced gene is amutated gene or a functional gene.

Factor IX. The genome editing complex with a cleavage domain introducesa double strand break into the albumin gene locus (e.g., into intron 1)concomitant with delivery to the cell of an ectopic nucleic acid bearinga cDNA of the factor IX gene. The double strand break leads to theintegration of the ectopic nucleic acid into intron 1 of the albumingene; the factor IX protein is secreted by the cell into thecirculation. The target cell is a hepatocyte and the subject in needthereof has Hemophilia B.

Ex Vivo Engineering of a Cell to Introduce a Gene

The genome editing complex is transfected into cells ex vivo along withan ectopic nucleic acid bearing a gene. Upon transfection of cells exvivo, the DNA binding domain of the genome editing complex binds aregion of DNA in a target cell and the cleavage domain induces a doublestrand break in the DNA of the target cell to introduce an ectopicallyprovided gene (also provided to the cell) in the region cleaved by thegenome editing complex. The resulting engineered cells with modified DNAare administered to a subject in need thereof. The subject has adisease.

CAR. The genome editing complex with a cleavage domain introduces achimeric antigen receptor (CAR) by editing the DNA of a target cell. Thetarget cell is a T cell and the subject has cancer, such as a bloodcancer. Upon administration of the engineered cells to a subject, theengineered CAR T cells effectively eliminate cancer in the subject.

Direct Administration to Partially or Completely Knock Out a Gene

The genome editing complex is administered directly to a subject in needthereof and is taken up by a cell. The subject has a disease. The DNAbinding domain of the genome editing complex binds a region of DNA in atarget cell and the cleavage domain induces a double strand break in theDNA of the target cell to partially or completely knock out a gene.

TTR. The genome editing complex with a cleavage domain partially orcompletely knocks out the transthyretin (TTR) gene by editing the DNA ofa target cell. The target cell is a liver cell and the subject in needthereof has transthyretin amyloidosis (ATTR).

SERPINA1. The genome editing complex with a cleavage domain partially orcompletely knocks out the SERPINA1 gene by editing the DNA of a targetcell. The target cell is a liver cell and the subject in need thereofhas alpha-1 antitrypsin deficiency (dA1AT def).

Ex Vivo Engineering of a Cell to Partially or Completely Knock Out aGene or a Gene Regulatory Region

The genome editing complex is transfected in cells ex vivo. Upontransfection of cells ex vivo, the DNA binding domain of the genomeediting complex binds a region of DNA in a target cell and the cleavagedomain induces a double strand break in the DNA of the target cell topartially or completely knock out a gene or a gene regulatory region.The subject has a disease.

BCL11A Enhancer. The genome editing complex with a cleavage domainpartially or completely knocks out the BCL11A erythroid enhancer byediting the DNA of a target cell. The target cell is an HPSC and thesubject in need thereof has b-thalassemia or sickle cell disease.

CCR5. The genome editing complex with a cleavage domain partially orcompletely knocks the CCR5 gene by editing the DNA of a target cell,thereby allowing for introduction of a mutated version of CCR5. Targetcells, in which mutated versions of CCR5 are introduced via the actionof the genome editing complex, are not infected by HIV via the modifiedCCR5 receptor. The target cell is a T cell or a hematopoietic stem cell(HPSC) and the subject has HIV.

Upon administration of the genome editing complex directly to a subjector upon administration of an engineered cell with DNA that has beenmodified with the genome editing complex, the disease symptoms areeliminated or reduced.

Example 7 TALE Protein with N-Terminus Fragment

A DNA binding protein engineered to have a shortened N-terminus derivedfrom a TALE protein was generated. U.S. Pat. No. 8,586,526 shows thatwhile the N-terminus region (referred to as N-cap) from a TALE proteincan be shortened by deleting amino acids at the N-terminus, deletingamino acids beyond amino acid position N+134 decreased DNA bindingaffinity, with the decrease in DNA binding apparent even with deletionof amino acids beyond amino acid position N+137. U.S. Pat. No. 8,586,526concluded that amino acid sequence from N+1 through N+137 are requiredfor binding to DNA while the first 152 amino acids of the N-cap sequenceare dispensable.

However, it has been discovered that further deleting amino acids tillposition N+116 surprising leads to recovery of DNA binding. Even shorterN-terminus regions such as a fragment having deletion till positionN+111 also retains DNA binding activity. Deleting amino acids tillposition N+106 significantly decreases DNA binding. Further deletion ofthe N-terminus region, such as, deleting amino acids till position N+101does not lead to recovery of DNA binding. See FIG. 2.

TALEN monomers recognizing 5′-TTTCTGTCACCAATCCT-3′ (SEQ ID NO: 449) and5′-TCCCCTCCACCCCACAGT-3′ (SEQ ID NO: 450) in the human AAVS1 locus wereengineered to harbor N-terminus regions that included deletionsencompassing residues N137-116, N137-111, N137-106 and N137-101. Whilethese residues are numbered with reference to the N+137 construct inU.S. Pat. No. 8,586,526, N137-116 refers to deletion of amino acidsstarting at the N-terminus of the N-cap sequence (N+228) and extendingthrough amino acid residue 116 such that the resulting fragment retainsamino acids residues from position N+115 to position N+1, and so on. Theamino acid sequence of the N-terminal truncation del_N137-116 is setforth in SEQ ID NO:321. The amino acid sequence of the N-terminaltruncation del_N137-111 is set forth in SEQ ID NO:447.

NK562 cells were transfected with 2 vg plasmid DNA for each TALENmonomer using an AMAXA™ Nucleofector™ 96-well Shuttle™ system as per themanufacturer's recommendations. Full length TALEN monomers were included(“AAVS1 control”), together with N137-116/full length and fulllength/N137-116 heterodimers. Cells were cold shocked at 30° C. andgenomic DNA was harvested at 72 h using QuickExtract™ (Lucigen). Indelrates were determined by amplicon sequencing. The TALE repeats presentin the TALE monomers have the sequenceLTPDQVVAIAS(RVD)GGKQALETVQRLLPVLCQDHG (SEQ ID NO: 451), with a RVDselected based on the target sequence.

FIG. 2 represents DNA binding activity assayed by measuring nucleaseactivity of FokI fused to C-terminus of the polypeptides. AAVS1 controldata set correspond to TALENS using the standard full-length N-terminus(N+288 to N+1). N-terminal truncation del_N137-116 (N-terminus extendingfrom N+115 to N+1) showed higher activity than standard full-lengthN-terminus (N+288 to N+1). N-terminal truncation del_N137-111(N-terminus extending from N+110 to N+1) was also active. Furthertruncation del_N137-106 (N-terminus extending from N+105 to N+1)significantly decreased DNA binding. Further deletion of the N-terminusregion del_N137-101 (N-terminus extending from N+100 to N+1) did notlead to recovery of DNA binding. Thus, a fragment of the N-terminus of aTALE protein extending from N+115 to N+1 shows full activity. Mock/GFPis a negative control. The AAVS1/del_N137-116 data shows that an N1-115TALEN monomer can be combined with a monomer comprising full-lengthN-terminus region of a TALE protein.

While preferred embodiments of the present invention have been shownand, it will be apparent to those skilled in the art that suchembodiments are provided by way of example only. Numerous variations,changes, and substitutions will now occur to those skilled in the artwithout departing from the invention. It should be understood thatvarious alternatives to the embodiments of the invention may be employedin practicing the invention. It is intended that the following claimsdefine the scope of the invention and that methods and structures withinthe scope of these claims and their equivalents be covered thereby.

1.-99. (canceled)
 100. A polypeptide comprising a modular nucleic acidbinding domain comprising a plurality of repeat units, wherein a repeatunit of the plurality of repeat units comprises a sequenceA₁₋₁₁X₁X₂B₁₄₋₃₅ (SEQ ID NO: 448), wherein each amino acid residue ofA₁₁₁ comprises any amino acid residue; X₁X₂ comprises a binding regionconfigured to bind to a target nucleic acid base in a target site; andB₁₄₋₃₅ has at least 92% sequence identity to GGKQALEAVRAQLLDLRAAPYG (SEQID NO: 280), and a first repeat unit of the plurality of repeat unitscomprises at least one residue in A₁₁₁, B₁₄₋₃₅, or a combination thereofthat differs from a corresponding residue in a second repeat unit of theplurality of repeat units.
 101. The polypeptide of claim 100, whereinthe at least one repeat unit comprises the amino acid sequence of anyone of SEQ ID NO: 267-SEQ ID NO:
 279. 102. The polypeptide of claim 100,wherein the at least one repeat unit comprises at least 80% sequenceidentity with any one of SEQ ID NO: 203, SEQ ID NO: 209, SEQ ID NO: 197,SEQ ID NO: 233, SEQ ID NO: 253, or SEQ ID NO:
 218. 103. The polypeptideof claim 100, wherein the at least one repeat unit comprises at least80% sequence identity with any one of SEQ ID NO: 168-SEQ ID NO: 263.104. The polypeptide of claim 100, wherein each repeat unit of theplurality of repeat units is separated from a neighboring repeat unit bya linker comprising a recognition site.
 105. The polypeptide of claim104, wherein the recognition site is for a small molecule, a protease,or a kinase.
 106. The polypeptide of claim 104, wherein the recognitionsite serves as a localization signal.
 107. The polypeptide of claim 100,further comprising one or more of: (a) at least one repeat unitcomprising greater than 39 amino acid residues; (b) at least one repeatunit comprising greater than 35 amino acid residues derived from thegenus of Ralstonia; and (c) at least one repeat unit comprising lessthan 32 amino acid residues.
 108. The polypeptide of claim 107, whereinthe at least one repeat unit comprises an amino acid selected fromglycine, alanine, threonine or histidine at a position after an aminoacid residue at position
 39. 109. The polypeptide of claim 107, whereinthe at least one repeat unit comprises an amino acid selected fromglycine, alanine, threonine or histidine at a position after an aminoacid residue at position
 35. 110. The polypeptide of claim 100, whereinthe polypeptide further comprises a cleavage domain linked to themodular nucleic acid binding domain to form a non-naturally occurringfusion protein.
 111. The polypeptide of claim 100, wherein the modularnucleic acid binding domain further comprises one or more propertiesselected from the following: (a) binds the target site, wherein thetarget site comprises a 5′ guanine; (b) comprises from 7 repeat units to25 repeat units; and (c) upon binding to the target site, the modularnucleic acid binding domain is separated from a second modular nucleicacid binding domain bound to a second target site by from 2 to 50 basepairs.
 112. The polypeptide of claim 100, wherein the binding regioncomprises HD for binding to cytosine, NG for binding to thymidine, NKfor binding to guanine, SI for binding to adenosine, RS for binding toadenosine, HN for binding to guanine, or NT for binding to adenosine.113. The polypeptide of claim 100, wherein the modular nucleic acidbinding domain comprises an N-terminus amino acid sequence fromXanthomonas spp., Legionella quateirensis, or Ralstonia solanacearum.114. The polypeptide of claim 100, wherein the modular nucleic acidbinding domain comprises a C-terminus amino acid sequence fromXanthomonas spp., Legionella quateirensis, or Ralstonia solanacearum115. The polypeptide of claim 100, wherein the modular nucleic acidbinding domain comprises an N-terminus amino acid sequence and aC-terminus amino acid sequence from Xanthomonas spp., Legionellaquateirensis, or Ralstonia solanacearum.
 116. The polypeptide of claim113, wherein the N-terminus amino acid sequence comprises at least 80%sequence identity to SEQ ID NO: 264, SEQ ID NO: 300, SEQ ID NO: 335, SEQID NO: 303, SEQ ID NO: 301, SEQ ID NO: 304, SEQ ID NO: 320, SEQ ID NO:321, or SEQ ID NO:
 322. 117. The polypeptide of claim 114, wherein theC-terminus amino acid sequence comprises at least 80% sequence identityto SEQ ID NO: 266, SEQ ID NO: 298, or SEQ ID NO:
 306. 118. Thepolypeptide of claim 100, wherein the modular nucleic acid bindingdomain comprises a half repeat comprises at least 80% sequence identityto SEQ ID NO: 265, SEQ ID NO: 327-SEQ ID NO: 334, or SEQ ID NO: 290.119. A method of modulating expression of an endogenous gene in a cell,the method comprising: introducing into the cell the polypeptide ofclaim 100, wherein the DNA binding polypeptide binds to a target nucleicacid sequence present in the endogenous gene and the heterologousfunctional domain modulates expression of the endogenous gene.